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Gl  ossary  of  Tormr. 


1.  Butterfly:  The  DFT  computation  of  Figure  3.4  pro¬ 
vides  the  notation  whose  appearance  is  that  of  a 
"butterfly" . 

2.  Fixed  Radix:  The  term  "radix"  is  commonly  used  to 
describe  a  specific  FFT  decomposition.  The  term 
"fixed"  radix  means  that  all  the  factors  of  N  are 
the  same. 

3.  Mixed  Radix:  All  the  factors  of  N  are  not  identical. 

4.  Relatively  Prime:  The  numbers  in  a  given  set  are  said 
to  be  relatively  prime  when  no  number  in  the  set  is 
divisible  (with  no  remainder)  by  any  other  n\imber  in 
the  set.  Example,  (2,  3,  7,  9)  are  not  relatively 
prime  sets  because  9  is  divisible  (with  no  remainder) 
by  3.  The  following  example  is  relatively  prime: 

(2,  3,  5,  7). 

5.  Square  and  Square-free  Factors:  For  the  case  where 
N  =  4  •  3  •  7  •  4,  the  "4s"  are  square  factors  and 
the  3  and  7  are  square-free. 

6.  Twiddle  Factors:  The  term  refers  to  the  complex 
multipliers  of  Figure  3.8  which  pre-multiply  the  FFT 
butterflies.  They  are  sometimes  called  phase  or 


rotation  factors. 


Abstract 


A  comprehensive  comparison  of  the  most  efficient 
Discrete  Fourier  Transform  (DFT)  techniques  is  presented. 
The  DFT  algorithms  selected  are  the  fixed  radix  Fast 
Fourier  Transform  (FFT)  ,  mixed  radix  FFT,  the  VJinograd 
Fourier  Transform  Algorithm  (WFTA) ,  and  the  Prime  Factor 
Algorithm  (PFA) .  Comparison  of  the  algorithms  is  based 
on  the  number  of  real  multiplications,  additions,  and 
memory  arrays  required  as  a  function  of  sequence  length  N. 
This  paper  reviews  the  literature,  selects  the  most 
efficient  DFT  FORTRAN  programs  available,  develops  the 
number  of  real  multiplications  and  additions  as  a  function 
of  N,  and  compares  the  algorithms  using  tables  and  plots  of 
real  multiplications,  additions,  and  memory  arrays.  This 
comparison  shows  that  the  WFTA  and  PFA  require  the  least 
real  multiplications  and  additions,  but  the  fixed  radix 
and  mixed  radix  FFTs  require  the  least  memory.  The  mixed 
radix  FFT  is  much  more  flexible  than  WFTA  or  PFA  since  N 
can  be  any  length  sequence.  The  WFTA  and  PFA  are  closely 
studied  and  tradeoffs  between  the  two  are  discussed.  The 
PFA  uses  less  additions  but  more  multiplications  for  most 
sequence  lengths  which  means  the  WFTA  is  more  efficient 
when  multiplications  are  "costly"  relative  to  additions. 

The  PFA  uses  less  memory  than  the  WFTA  making  the  PFA 
preferable  when  the  machine  memory  is  limited.  -  based  on 


tho  results  of  tlie  paper,  an  algorithm  is  presented  to  select 
the  most  efficient  DPT  for  an  M  length  sequence  given  the 
multiply  speed,  add  speed,  and  memory  size  of  the  computer. 


I.  Introduction 


1 . 1  Background 

Computing  the  Discrete  Fourier  Transform  (DPT)  of  N 
points  has  many  applications  in  scientific  and  engineering 
calculations.  In  1965  Cooley  and  Tukey  described  an 
algorithm  which  became  known  as  the  Fast  Fourier  Transform 

(FFT)  because  it  reduced  the  number  of  complex  operations 

.  2 
required  to  compute  the  DFT  from  N  to  N  log2  N  where 

N=2’^,  m  an  integer.  Using  ideas  proposed  in  the  Cooley- 
Tukey  paper  a  mixed  radix  algorithm  was  written  and  pub¬ 
lished  in  1969  by  Singleton  which  permitted  N  to  be  any 
positive  integer  length  sequence. 

In  1976  Winograd  proposed  a  mixed  radix  DPT  algorithm 
which  (1)  converted  the  DFT  to  circular  convolution, 

(2)  used  fast  convolution  algorithms  to  perform  "short- 
DFTs",  and  (3)  nested  these  short-DFTs  into  a  structure  to 
perform  long  Fourier  transforms  on  complex  data  sequences. 
This  algorithm  became  known  as  the  Winograd  Fourier  Trans¬ 
form  Algorithm  (WFTA) ,  The  WFTA  maintained  the  real 
additions  count  at  the  FFT  levels  while  significantly 
reducing  the  real  n.ultiplications  required. 

Kolba  and  Parks,  1977,  used  Winograd's  fast  convolu¬ 
tion  algorithms  and  proposed  a  new  Prime  Factor  Algorithm 
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(PFA) .  This  new  algorithm  modiiied  the  short-DFTs  to  usf 
"shifts"  instead  of  multiplication  by  1/2  and  did  lujt  uiu- 
the  nested  structure  of  WPTA.  As  a  consequence  the  IM  A 
uses  more  real  multiplications  and  less  additions  rL-lutive 
to  the  WFTA  for  a  given  length  sequence  N. 

1.2  Problem 

Both  Winograd,  1976,  and  Kolba-Parks,  1977,  compand 
their  operations  count  to  that  of  the  FFT  but  did  not 
include  all  possible  WFTA  and  PFA  sequence  lengths.  Fur¬ 
ther,  no  comparisons  were  made  on  the  basis  of  memory  arrays 
required  by  each  algorithm  as  a  function  of  N.  This  paper 
presents  a  comprehensive  comparison  of  fixed  radix  FFTs, 
mixed  radix  FFTs,  WFTA,  and  PFA  based  on  real  operations 
and  memory  arrays.  This  comparison  provides  the  informa¬ 
tion  needed  to  select  the  most  efficient  algorithm  to 
perform  the  DFT  based  on  machine  size,  machine  speed, 
and  real  operations . 

1 . 3  Scope 

This  paper  reviews  the  literature,  selects  DFT 
algorithms  for  comparison,  studies  the  theory  of  each 
algorithm  selected,  develops  the  real  operation  and 
memory  count  as  a  function  of  N,  compares  those  algorithms 
using  tables  and  plots  of  operation  and  memory  counts, 
and  presents  an  algorithm  to  select  the  most  efficient 
techniques . 


II  nr 
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The  DI’T  alcjoriLlims  selcjcted  for  study  and  comparison 

art-' : 


(1) 

Radix-2  FFT 

(2) 

Radix- 3  FFT 

(  3) 

Radix- 3  FFT 

in 

the  R(u) 

field 

(4) 

Radix-5  FFT 

(5) 

Mixed  radix 

FFT 

written 

by  the  author 

(6) 

Mixed  radix 

FFT 

written 

by  Singleton 

(7) 

Mixed  radix 
Mathematical 
CDC  Cyber  74 

FFT  available  from  International 
Subroutine  Library  (IMSL)  on  the 

(8) 

WFTA 

(9) 

PFA. 

Each  of  these  algorithms  has  a  particular  advantage  which 
makes  selection  of  the  best  algorithm  dependent  on  the 
machine  size,  machine  speed,  and  sequence  length. 

1 . 4  Assumptions 

To  a  first  approximation,  the  speed  of  an  FFT 
algorithm  is  proportional  to  the  number  of  complex 
multiplications  used.  The  number  of  times  the  data  array 
is  indexed  is,  hcjwcver,  an  important  secondary  factor 
(Singleton,  1969) .  Kolba  and  Parks,  1977,  substantiated 
this  assumption  by  timing  the  PFA  and  FFTs  on  an  IBM 
370/155  for  several  sequence  lengths  and  showing  that  the 
FORTRAN  coded  PFA  (having  loss  real  additions  and  multi¬ 
plications)  was  faster  than  the  FFT  FORTRAN  algorithms. 
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In  1978  Morris  demonstrated  that  the  sequence  of 
arithmetic  operations  in  a  DFT  algorithm's  internal 
structure  can  result  in  different  execution  times  "between 
ostensibly  equivalent  algorithms  on  a  given  machine" 
and  that  the  computer  dependent  algorithm/architecture 
interactions  may  also  alter  relative  performance  of  the 
different  algorithms.  He  modified  the  FORTRAN  coded 
radix-4  FFT  and  WFTA  programs  and  matched  them  to  the 
PDF  11/55  and  IBM  370/168  architecture  and  showed  that 
the  WFTA  offered  neither  time  or  space  advantages  over  the 
radix-4  FFT.  Morris  achieved  these  results  because  "the 
radix-4  FFT  appears  almost  ideally  matched  to  the  PDP-11 
architecture"  whereas  the  WFTA  "has  extra  load/store 
burdens"  and  requires  extra  data  array  indexing. 

Morris  demonstrated  that  it  may  be  possible  to 
optimize  DFT  algorithms  to  match  a  certain  machine,  how¬ 
ever,  this  type  of  optimization  of  the  FORTRAN  DFT  algo¬ 
rithms  is  outside  the  scope  of  this  paper.  It  is  assumed 
that  existing  FORTRAN  coded  DFT  algorithms  will  not  be 
modified  and  selecting  an  algorithm  which  minimizes  real 
operations  produces  the  most  efficient  algorithm. 

This  paper  derives  and  tabulates  real  operations 
counts  as  a  function  of  N  for  the  algorithms  listed  in 
Section  1.3.  The  most  efficient  DFT  algorithms  are  timed 
on  the  CDC  Cyber  74  computer  and  compared  to  the  predicted 
execution  time  based  on  real  operations.  These  predicted 
times  are  shown  to  be  consistent  with  the  timing  results. 


1 . 5  Approach  and  Presentation 

A  literature  review  is  presented  in  Chapter  II  which 
starts  with  the  1965  Cooley-Tukey  paper  and  follows  the 
various  DFT  algorithm  developments  up  through  Kolba-Parks ' 
1977  article.  The  review  puts  Rader's  1968  landmark  paper 
in  perspective  with  Winograd's  "nested"  DFT  algorithm  and 
the  subsequent  work  by  Kolba  and  Parks . 

Next,  the  theory  behind  the  DFT  algorithms  is  reviewed, 
the  real  operations  count  developed,  and  the  memory  array 
count  needed  for  a  sequence  length  N  is  determined.  The 
general  expressions  for  real  operations  and  memory  array 
counts  are  developed  from  published  articles  or  from  the 
background  theory  and  then  plotted  and  tabulated  as  a 
function  of  N.  The  readers  familiar  with  the  FFT  and 
Winograd  background  theory  may  wish  to  skip  Sections  3.1 
and  3.2. 

In  Chapter  IV  comparison  tables  and  plots  of  the 
DFT  algorithms  make  it  possible  to  select  the  most 
efficient  algorithm  based  on  real  operations  and  memory 
array  required.  Timing  results  from  the  CDC  Cyber  74 
system  for  representative  sequence  lengths  are  tabulated 
to  substantiate  the  assumption  that  minimizing  real 
operations  equates  to  maximizing  efficiency.  An  algorithm 
is  also  presented  at  the  end  of  Chapter  IV  which  uses  the 
tables  in  this  paper  to  select  the  most  efficient  DFT 
technique  given  the  sequence  length,  memory  size,  and 
computer  add  and  multiply  speed. 


Conclusions  and  recommendations  are  presented 


Chapter  V. 


II. 


LITERATURE  REVIEW 


The  calculation  of  the  Discrete  Fourier  Transform  (DFT) 
is  a  central  operation  performed  in  digital  signal  proces¬ 
sing  but  v/as  not  widely  used  for  other  than  trivial  sequence 
lengths  because  of  the  cumbersome  DFT  evaluation: 

N-1 

X(k)  =  Z  x{n)  exp(-j27Tnk/N)  (2.1) 

n=0 

2 

which  required  on  the  order  of  N  complex  operations. 

In  1965  Cooley  and  Tukey  published  "An  Algorithm  for 
the  Machine  Calculation  of  Complex  Fourier  Series"  which 
stimulated  the  widespread  use  of  an  algorithm  which  became 
known  as  the  "Fast  Fourier  Transform"  (FFT) .  Their  paper 
proposed  an  efficient  method  of  computing  the  DFT  by  factor¬ 
ing  an  N  length  sequence  into  its  prime  components : 

^  "  ”l  ^2  ' * '  ”m  (2.2) 

and  then  decomposing  Eq  (2.1)  into  m  steps  with  N/n^  trans¬ 
formations  within  each  step.  If  n,=n_=  ...  n  =2,  the 

operations  are  reduced  to  the  N  log2  N  level  from  the 
2 

previous  N  level. 

Most  of  the  early  work  on  the  FFT  (Bergland,  1968)  was 
directed  toward  the  special  cases  where  N=2*'^  which  yielded 
simple  and  efficient  algorithms.  These  algorithms  are 
efficient  because  no  multiplications  are  needed  to  evaluate 
the  2-point  DFT  butterflies  which  can  reduce  the  operations 
count  below  the  N  log2  N  level. 
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t  other  "fixed  radix"  algorithms  wore  studied  and  Dubois 

and  Venetsnnopoulos  r^ublished  "A  Mew  Radix-3  Algorithm"  in 
1978  which  demonstrated  that  a  radix-3  butterfly  could  be 
computed  without  multiplications  by  defining  a  new  basis 
(l,u)  instead  of  using  the  complex  plane  {l,i)  basis,  where 
u  is  the  complex  cube  root  of  unity.  This  technique  was 
later  shown  to  be  limited  to  the  special  cases  of  3™  and  6™ 
(Burrus  and  Parks,  1979). 

Based  on  Cooley  and  Tukey's  paper  "mixed-radix" 
algorithms  were  written  by  Brenner  and  Singleton.  The 
most  efficient  and  popular  of  these  algorithms  was  "An 
Algorithm  For  Computing  the  Mixed  Radix  Fast  Fourier  Trans¬ 
form"  published  in  1969  by  Singleton  and  is  frequently  used 
in  digital  signal  processing  where  a  wider  choice  of  N  is 
needed.  The  Singleton  algorithm  can  perform  the  DFT  using 
FFT  techniques  of  any  length  sequence  N  but  becomes  most 
efficient  when  N  is  highly  composite  from  the  set  of  inte¬ 
gers  2,  3,  4,  and  5.  If  N  is  a  prime  number  the  algorithm 

2 

performs  a  DFT  using  N  operations.  The  Singleton  algorithm 
became  the  standard  against  which  all  future  DFT  techniques 
were  measured. 

In  1968  Rader  presented  "DFTs  when  the  Number  of  Data 
Samples  Is  Prime"  which  showed  that  a  prime  number  length 
sequence  contains  an  (N-1)  point  circular  convolution.  He 
showed  how  to  isolate  the  convolution  by  applying  a  permuta¬ 
tion  to  the  (N-1)  signal  points  x(l),  x(2),  ...  ,  x(N-l). 

He  also  gave  the  permutation  applied  to  the  complex 
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multipliers  from  the  set  [exp  (- j27rnk/N)  ,  k=l ,  2  ,  N-1]  . 

Both  of  the  permutations  were  generated  by  using  a  "primi¬ 
tive"  root  which  exists  for  N  length  prime  sequences 
(McClellan  and  Rader,  1979).  Rader's  paper  was  largely 
overlooked  for  many  years  but  took  on  new  significance  when 
Winograd  presented  his  new  DFT  algorithm  "On  Computing  the 
Discrete  Fourier  Transform"  in  1976. 

Winograd  combined  Rader's  idea  of  converting  a  DPT  to 
circular  convolution  with  his  own  fast  convolution  algo¬ 
rithms  to  produce  a  new  DFT  method  called  the  "Winograd 
Fourier  Transform  Algorithm"  (WFTA).  Winograd  provided  the 
fast  convolution  algorithms  for  short  prime  and  prime  power 
length  sequences  and  proposed  that  longer  transforms  be 
computed  by  "nesting"  the  short-high  speed  transforms.  He 
presented  a  table  comparing  the  WFTA  to  the  radix-2  FFT 
operations  and  showed  that  the  number  of  additions  remained 
at  the  FFT  levels  while  the  number  of  multiplications  was 
significantly  reduced. 

Kolba  and  Parks  published  "A  Prime  Factor  FFT  Algorithm 
Using  High  Speed  Convolution"  in  1977  which  modified 
Winograd' s  fast  convolution  algorithms  to  permit  "shifts" 
instead  of  multiplications  by  1/2.  They  also  changed  the 
nested  structure  of  the  WFTA  in  favor  of  a  conventional  FFT 
decomposition.  The  decomposition  of  the  sequence  was  based 
on  an  algorithm  proposed  by  Thomas,  1963,  in  his  article 
"Using  a  Computer  to  Solve  Problems  in  Physics"  which  uses 
an  index  mapping  based  on  the  Chinese  Remainder  Theorem. 
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Kolba  and  Parks  selected  several  N  length  sequences  and 

compared  their  operations  count  to  WFTA  and  FFT. 

Paralleling  Winograd's  fast  convolution  work  are  the 

studies  into  number  theoretic  transforms  (NTTs)  which  have 

been  proposed  for  digital  cyclic  convolution  and  digital 

filtering.  The  NTTs  were  first  published  by  Pollard,  1971, 

in  "The  Fast  Fourier  Transform  in  the  Finite  Field".  He 

showed  that  an  analogous  transform  to  the  DFT  exists  in  the 

finite  (or  Galois)  field  where  exp(j2TTnk/N)  terms  are 
nk 

replaced  by  r  in  the  DFT  expression  such  that: 

N-1  , 

X(k)  =  1  x(n)  r’^^  (2.3) 

n=0 

Notice  that  Pollard  chose  the  alternative  definition  of  the 
DFT  where  the  exponent  of  e  is  positive.  The  r  term  is 
defined  in  the  Galois  field  (GF)  such  that  the  same  cyclic 
convolution  properties  exist  in  GF  and  in  the  complex  field 
for  the  DFT.  He  then  proved  that  this  analogous  DFT  could 
3pply  prime  factor  decomposition  to  the  N  length  sequence 
and  perform  N/n^  transformations  to  reduce  the  operations 
in  GF  to  the  N  log2  N  level  which  provided  the  FFT  in  GF. 
Pollard  proposed  that  this  technique  be  applied  to  cyclic 
convolutions  in  GF,  multiplication  of  polynomials  over 
GF(p’^),  aperiodic  convolution  of  integer  sequences,  multi¬ 
plication  of  very  large  integers,  division  of  polynomials 
over  GF{p),  and  a  chirp-Z~transf orm  for  NTTs  (McClellan  and 
Rader,  1979) . 
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Pollard's  paper  stimulaled  more  s'cudy  of  t)io  NTTs . 

Rood  and  Truonq ' s  1075  paper,  "The  Use  of  PiniLo  Fields  to 

Compute  Convolutions",  includes  complex  valued  NTTs.  It 

2 

was  snown  that  this  NTT  over  GF(q  )  can  reduce  convolution 

operations  to  the  FFT  levels.  If  q  is  sufficiently  large 

2 

the  NTT  can  be  used  over  GF{q  )  to  transform  a  sequence  of 

2 

complex  integers  x(n)  into  X(k)  on  GF(q  )  for  which  the 

2 

inverse  transform  of  X(k)  on  GF(q  )  is  precisely  the 
original  sequence  x(n).  Using  these  ideas  filtering  or 
convolutions  without  roundoff  errors  can  be  obtained  on  a 
sequence  of  complex  integers. 

Most  applications  of  the  NTTs  have  been  in  the  areas 
of  digital  filtering  and  convolution.  The  author  was  not 
able  to  find  any  NTT  algorithm  which  could  be  compared  to 
the  FFT,  WFTA,  or  PFA  and  perform  all  the  same  functions 
as  these  three  algorithms. 

PFA,  WFTA,  and  FFT  represent  the  most  efficient  and 
flexible  FORTRAN  programs  available  to  perform  the  DFT. 

Each  algorithm  has  its  own  particular  advantage  over  the 
other  two  depending  on  machine  size  and  speed  for  a  particul 
sequence  length.  None  of  the  articles  reviev/ed  [irescnts  a 
comprehensive  evaluation  or  comparison  of  the  three 
algorithms  based  on  real  operations  and  memory  arrays 
required  to  perform  a  DFT  for  any  sequence  length  N.  This 
paper  fills  tliat  need  so  that  an  efficient  algorithm  can 


III.  F’^T  Theory 

The  set  of  algorithms  known  as  the  Fast  Fourier 
Transforms  (FFT)  use  a  variety  of  methods  to  reduce  the 
computation  time  required  to  evaluate  the  Discrete 
Fourier  Transform  (DFT) .  The  DFT  is  the  central  part 
in  most  spectrum  analysis  problems  and  the  FFT  can  improve 
performance  by  a  factor  of  100  or  more  over  direct  eval¬ 
uation  of  the  DFT  (Rabiner  and  Gold,  1975).  Therefore, 
the  FFT  is  crucially  important  to  the  digital  signal 
processing  techniques. 

This  section  begins  with  "fixed  radix"  FFT  algorithms 
by  discussing  a  "decimation-in-time”  algorithm,  the  data 
reordering  (bit  reversal)  theory,  the  real  operations 
(addition  and  multiplication)  count,  a  new  fixed  radix 
algorithm  in  the  finite  field,  and  then  summarizes  the 
memory  required  to  use  the  fixed  radix  algorithms.  Next 
the  conventional  "mixed"  radix  algorithms  are  presented 
by  discussing  the  theory,  digit  reversal,  real  operations 
count,  and  memory  required  to  utilize  the  mixed  radix 
algorithms.  This  theory  chapter  concludes  with  a  dis¬ 
cussion  of  mixed  radix  algorithms  based  on  fast  convolu¬ 
tion.  The  theory,  data  reordering,  real  operations  count 
and  memory  are  also  presented  for  these  algorithms. 

Before  discussing  the  FFT  algorithms  comments  must 
be  made  relative  to  computing  the  trigonometric  function 
values  needed  to  evaluate  tlie  FFT. 
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3 . 1  Computing  Trigonometric  Pune t ion  Values 

The  trigonometric  values  used  in  FFTs  can  be  repre¬ 
sented  as  values  on  the  unit  circle.  The  values  are  based 
on  integer  powers  of 

exp  {-j  271 /N) 

which  can  be  computed  using  sine  and  cosine  functions.  It 
is  useful  to  have  accurate  methods  of  generating  the  sine 
and  cosine  terms  other  than  the  method  of  repeated  use  of 
library  sine  and  cosine  functions. 

The  method  most  widely  used  in  FFT  algorithms 
(Singleton,  1967)  generates  the  trigonometric  functions  by 
a  difference  equation  given  by: 

cos  ( (k+1) a) 

=  (C  •  cos{ka)  -  S  •  sin  (ka) )  +  cos{ka) 
sin  ((k+l)a) 

=  (C  •  sin(ka)  +  S  •  cos(ka))  +  sin(ka) 

where 

C  =  -2  sin^  (a/2) 

S  =  sin(a) 
cos  (0)  =  1 
sin  (0)  =  0 

This  technique  is  used  for  all  FFTs  presented  in  this  paper 
(except  noted  otherwise)  because  it  minimizes  using  FORTRAN 
library  subroutines  cos  (•)  and  sin  (•)  thereby  reducing 
the  overall  FFT  computation  time. 
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3 . 2  }'ixed  Radix  Algor i  t:hins 

While  FFT  algorithms  arc  well  known  and  v;idc]y  used, 
they  are  relatively  intricate  and  somewhat  difficult  to 
grasp  at  first  reading.  There  are  two  excellent  textbooks 
(Rabiner  and  Gold,  1975;  Oppenheim  and  Schafer,  1975) 
which  discuss  the  FFT  theory  in  great  detail  and  present 
FFTs  based  on  decimation-in-time  and  frequency.  Both 
texts  spend  a  great  deal  of  time  discussing  the  radix-2 
FFT,  which  is  the  most  widely  known  and  ured.  For  this 
reason,  the  radix-2  development  is  presented  here  as  a 
convenience  for  the  reader  and  provides  a  theoretical 
background  from  which  the  other  fixed  radix  algorithms  are 
derived. 

3.2.1  Development  of  Radix- 2  Theory.  To  achieve 

the  reduction  in  complex  operations  (defined  as  four  real 

2 

multiplications  and  two  real  additions)  from  N  to  N  log2  N 

it  is  necessary  to  decompose  the  DFT  computation  into 

smaller  and  smaller  DFT  computations.  As  a  result,  the 

symmetry  and  periodicity  of  the  complex  exponential 

nk 

exp (- j 2TTnk/N)  =  Wj^  can  be  exploited.  This  radix-2 
algorithm  is  based  on  decomposition  of  the  sequence  x{n) 
from  the  DFT  expression: 

N-1 

X(k)  =  T.  X  (n)  exp  (- j  2iTnk/N)  (3.1) 

n=0 

k  =  0,  1,  ...,  N-1  and  N=2"' 
which  is  known  as  a  "decimation-in-time"  algorithm 
(Oppenheim  and  Schafer,  1975).  Since  N  is  an  even  integer, 
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X(k)  can  bo  computod  by  separating  x(n)  into  two  N/2  length 
sequences  consisting  of  even-numbered  i^oints  and  the  odd- 
numbered  points  in  x(n).  Using  n=2r  for  n  even  and  n=2r+l 


for  n  odd  Eq  (3.1)  becomes: 


(2r+l)k 


X(k)  =  Z  x{2i)VI  +  E  x{2r+l)W 
r=0  r=0  ^ 

where  T=(N/2)-l  and  =  exp(-j2TT/N)  .  By  expanding 


(3.2) 


(2r+l)k 


and  factoring  out  Eq  (3.2)  can  be  rewritten  as: 


2  rk  k  T 


2  rk 


X(k)  =  E  x(2r)(W^)  +  E  x(2r+l)(W-,) 

r=0  ^  r=0 


(3.3) 


But  Wjj  =  exp(-j4Tr/N)  =  exp (~j2Tr/ (N/2) )  =  Wj^/2  ^*3  (3.3) 


can  be  written  as: 


k  T 


X(k)  =  I  x(2r)W^^2  ^  x(2r+l)Wj^^2 


=  G(k)  +  Wjj  H(k)  (3.4) 

Each  of  the  sums  in  Eq  (3.4)  is  an  N/2  point  DFT,  the 
first  sum  being  the  even  numbered  points  of  the  original 
sequence  and  the  second  sum  being  the  odd  numbered  points 
of  the  original  sequence.  Although  the  index  k  =  0,1,..., N-1, 
each  of  the  sums  in  Eq  (3.4)  need  only  be  computed  over 
k  =  0,  1,  ...,  (N/2)-l,  since  G(k)  and  FI(k)  are  periodic 
in  k  with  period  N/2.  After  the  two  DFTs  in  Eq  (3.4)  are 
computed,  they  are  then  combined  to  yield  the  N-point  DFT, 
X(k).  Figure  3.1  indicates  the  computation  involved  in 
computing  X(k)  according  to  Eq  (3.4)  for  an  eight-point 
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Figure  3,1.  Flowgraph  of  the  Decimal  iou-In-l'iine 
Decompo.sition  of  an  N-Point  DFT 
Computation  into  Two  N/2-Point  DFT 
Conj-iutations  (N“8)  . 


NOTE:  The  in<  eciers  on  the  branchen  of 

repres('nt  the  yjowc'rn  of  ;  i.e. 


represent r.  W 


the  flowqraph 
,  the  "4" 


N' 


sc'(jucm:t! .  I'lqu'c  3.1  (Oi>[}ciihci  in  ana  fichafer,  1975)  uses 

Ltie  siijiial  I'lov;  caiivi-aitions  sucli  Lii.il  brandies  enteriny  a 

node  are  summed  to  produce  the  node  variable.  V-'hen  no 

coefficient  is  shown  the  branch  transmittance  is  assumed 

to  be  one.  For  other  branches  the  transmittance  of  a  branch 

is  an  integer  ['ov;er  of  W^.  Note  in  Figure  3.1  that  two 

four-point  DFTs  are  computed  using  G(k)  and  H(k).  X{0) 

0 

IS  obtained  by  multiplying  H(0)  by  and  adding  the  product 
to  G(0) .  X(l)  is  obtained  by  multiplying  H(l)  by  and 

adding  the  result  to  G(l) .  For  X(4)  it  would  follow  that 

4 

H(4)  is  multiplied  by  and  added  to  G(4)  ,  however,  since 

G(k)  and  H(k)  arc  both  periodic  in  k  with  period  4,  H(4)  = 

H(0)  and  G(4)  =  G(0).  Thus  X(4)  results  from  multiplying 
4 

H(0)  by  and  adding  the  produce  to  G(0). 

With  the  computation  of  the  N-point  DFT  of  Eq  (3.4) 

that  number  of  computations  can  be  compared  with  the  direct 

DFT  computation  of  Fq  (3.1).  For  the  direct  computation 

2 

without  using  symr’ctry  properties  N  complex  mul  tiplications 

wore  required.  Eg  (3.4)  requires  conqnit  ation  of  tv/o  N/2- 

2 

point  DI'l's,  whic.-li  require  2('/2)  co,T(>]e.x  mul  t  i  pi  ications 
and  about  2(N/2)‘  conii)lcx  iidditions  (Oppcn)ieim  ind  Schafer, 
1975).  Tha  two  N/?.-point  DFT.i  inur,t  be  combined,  requiring 
N  complex  mul  t  i pi  j  ca  I  ion.';  corrospondimj  to  mul  t  i  jilying  the 
second  .'uim  by  and  then  N  coni(>lex  .idd i  t  i on?; ,  corresponding 
to  adtiinq  the  pioducL  to  the  first  sum.  As  a  result,  the 
computation  of  ]:q  (J.4)  for  all  values  of  k  requires 


?  ? 

N  +  2(N/2)  ■  or  N  H-  (N '/2)  coiiipJ  ox  multi}:>l  ications  and 

2  2 

additions.  For  N>2,  N  +  N  /2  is  less  than  N  . 

The  expression  in  Eq  (3.4)  corresponds  to  decimating 

the  original  N-point  sequence  into  odd  and  oven  N/2-point 

sequences.  Since  N=2’^  the  N/2-point  sequences  arc  also 

even  and  then  each  G(k)  and  H(k)  can  be  further  decimated 

into  two  N/4 -point  DFTs,  which  could  then  be  combined  to 

yield  the  N/2-point  DFTs.  Decimating  the  N/2-point  sequences 

in  Eq  (3.4)  into  N/4-point  sequences  gives: 

(N/2)-l  rk 
G(k)  =  Z  g(r)W 

r=0 

(N/4)-l  2pk  (N/4)-l  (2p+l)k 

=  Z  g(2p)W  ^  g(2p+l)W 

p=0  p=0 

Letting  R  =  (N/4)-l, 

R  pk  k  R  pk 

Similarly, 

R  pk  k  R  pk 

H(k)  =  Z^h(2p)Wj^^4  +  Wj^^2  ^Q^^2p+l)W^/^  (3.6) 

If  the  four-point  DFT  in  Figure  3.1  are  computed  using 

Eq  (3.5)  and  (3.6)  then  that  computation  would  be  carried 

out  as  indicated  in  Figure  3.2.  Inserting  the  computation 

in  Figure  3.2  into  the  flowgraph  of  Figure  3.1  produces  the 

2 

complete  flowgraph  in  Figure  3.3.  Note  that  ~ 

used . 
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For  the  8-point  DFT  that  has  been  used  as  an  example, 
the  computation  has  been  reduced  to  a  computation  of  N/4- 
point  DFTs  where  N/4=2.  An  example,  2-point  DFT  for  x(0) 
and  x(4)  is  shown  in  Figure  3.4.  The  complete  flowgraph 
for  the  computation  of  the  8-point  DFT  is  shown  in  Figure 
3.5  and  was  obtained  with  the  computation  of  Figure  3.4 
and  inserting  it  in  Figure  3.3. 

Considering  the  more  general  case  with  N  a  power  of 
2  greater  than  3  the  same  decimation  procedure  would  be 
continued  by  decomposing  the  N/4-point  transforms  in 
Eqs  (3.5)  and  (3.6)  into  N/8-point  transforms.  This 
requires  v  stages  of  computation  where  v  =  log2  N.  Recall 
that  in  the  original  decomposition  of  the  N-point  trans¬ 
form  into  two  N/2-point  transforms,  the  number  of  complex 

2 

multiplications  and  additions  required  was  N  +  2(N/2)  . 

When  the  N/2-point  transforms  were  decomposed  into  N/4- 

2 

point  transforms  the  factor  of  (N/2)  is  replaced  by 
2 

N/2  +  2(N/4)  so  that  the  overall  computation  now  requires 
2 

N  +  N  +  4(N/4)  complex  multiplications  and  additions. 

If  N=2'^  this  can  be  done  at  most  v  =  log2  ^  times,  "so 
that  after  carrying  out  this  decomposition  as  many  times 
as  possible  the  number  of  complex  multiplications  and 
additions  is  equal  to  N  log2  N"  (Opponheim  and  Schafer,  1575). 

The  flowgraph  of  ^iguro  3.5  displays  the  operations 
explicitly.  By  counting  branches  with  transmittances  of 
the  form  it  is  seen  that  each  stage  has  N  complex 
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uml  I  i  p.l  i  ('at  i  asi;;  ant]  I!  coniplox  add.it  ionn  .  Since  there  are 
locj^  N  staejes  there  are  total  of  N  loq2  coraplex  multi¬ 
plications  and  additions  as  shown  before.  Further  reductions 
in  the  complex  operations  count  can  be  achieved  by  exploiting 

Y* 

the  symmetry  and  periodicity  of  V.’^. 

Note  that  on  encli  "stage"  of  Figure  3.5  the  computation 
takes  a  set  of  N  comi^lcx  numbers  and  transforms  them  into 
another  set  of  N  complex  numbers.  This  process  is  repeated 
v-log^N  times  resulting  in  the  DFT  computation.  For  example, 
in  computing  the  first  stage  of  Figure  3.5  one  set  of  stor¬ 
age  registers  would  contain  the  input  data  sequence  and  a 
second  set  of  storage  registers  would  contain  the  computed 

results  for  the  first  stage.  The  sequence  of  numbers 

t  h 

resulting  from  the  m^  stage  of  computation  is  denoted  as 
X^(i),  where  i  =  0,  1,  N-1  and  m=  1,  2,  v.  For 

the  following  stage,  the  previous  output  array,  X^^Ci)  , 
becomes  the  input  array  and  the  new  output  array  is  Xj^^^(i) 
for  the  (m+1)  stage  of  com.putation.  Using  this  notation, 
it  can  be  seen  that  the  basic  flowgraph  in  Figure  3.5  is 
given  by  Figure  3.6.  Using  the  notation  of  Figure  3.6  the 
equations  of  the  butterfly  are  given  by: 

^m-fl(^’>  %  ""m 

r+N/2 

X  (q)  -  X  (p)  f  W  X  (q)  (3.8) 

m  + 1  ^  m  n  m  ^ 

Because  of  the  appearance  of  Figviro  3.G  the  computation  of 
Eq;;  (3.7)  and  (3.8)  are  referred  to  as  the  "butterfly" 
computations . 
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The  nuinVjGr  of  complex  multipJ  .i  cati  ons  can  bo  reduced 
by  a  factor  of  2  uaiiuj  Lho  symmetry: 

N/2 


W, 


N 


=  exp  (-j  (271 /N)  •  N/2)  =  exp(-ji[)  =  -1 


so  that  the  Eq  (3.7)  becomes: 


X  (p)  =  X  (p)  +  W,,  X  (q) 
m+1  ^  m  N  m 


X 


m+1 


''m 


(3.9) 


(3.10) 


(3.11) 


Eqs  (3.10)  and  (3.11)  are  shown  in  Figure  3.7  which  reflects 
the  "twiddle  factor"  out  front  in  the  butterfly.  Since 
there  are  N/2  "butterflies"  of  the  form  of  Figure  3.7  per 
stage  and  log2  N  stages,  the  total  number  of  complex 
multiplications  required  is  (N/2)  log2N  instead  of  the 
N  log2N  used  in  Figure  3.5.  Using  the  "twiddle  factor" 
butterfly  flowgraph  of  Figure  3.6  as  a  replacement  for  the 
butterfly  of  Figure  3.4,  the  Figure  3.8  is  obtained. 

3.2.2  Development  of  Radix-3  FFT  Theory.  Starting 
with  the  restriction  that  the  N-point  sequence  be  an 
integer  pov/er  of  three  (N  =  3’^,  m  =  1,  2,  3,  ...),  the 
DFT  X(k)  was  computed  by  separating  the  discrete  time 
sequence  .s(n)  into  tlircc  N/3  j'oint  sequences.  X(k)  is 
given  by  the  DFT  expression: 

where  k  =  0,1,  ...,  N-1 


N-1  nk 

X  ( k )  =  y.  x  ( n ) 
n=0 


N 


and  =  cxp(-j27i/N) 


(3.12) 


breaking  x(n)  into  tlirec  N/3  point  ncquenccs  yields  x(3r), 
x(3r+l)  and  x(3r+2).  Substituting  those  into  Eq  (3.12) 
and  adju.sLing  the  respective  summations  to  (N/3)-l  yields: 
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P  (3r)k  P  (3r+l)k 

X(k)  -  5:  v(3r)W  +  i:  x(3r^l)W 

r-0  ^  r-0  ^ 

P  (3r+2)k 

H  Y.  x(3rH2)W 
r=0  ^ 

where  P  =  (N/3)-l  (3.13) 

By  regrouping  the  exponents  of  Kq  (3.13)  can  be 
rewritten  as: 

P  3rk  k  P  3rk 

X(k)  =  E  x(3r)W  +  W.,  2  x(3r+l)V'J., 

„  N  N  N 

r=0  r=0 

2k  P  3rk 

+  W„  2  x(3r+2)W  (3.14) 

r=0 

.  .  3 

By  rewriting  as: 

3 

=  exp(-j67i/N)  =  exp(-j2Tr/(N/3)  )  =  (3.15) 

Eq  (3.14)  can  be  expressed  as: 

P  rk  k  P  rk 

X(k)  =  2  x(3r)W  +  W  2  x(3r+l)W 

r=o  ^  r=0 

2k  P  rk 

+  W  2  x(3r+2)W  (3.16) 

r=0 

Each  of  the  sums  in  Eq  (3.16)  represents  an  N/3  point  DFT: 
the  first  being  the  N/3  DFT  of  the  3r  points  in  the 
original  sequence,  the  second  being  the  N/3  points  of 
3r+l,  and  the  third  being  the  N/3  points  of  3r+2  points  of 
the  original  sequence.  Although  the  index  k  of  X(k)  ranges 
over  N  values  (k  -  0,  1,  ...»  N-])  each  of  the  summations 
in  Eq  (3.16)  needs  computation  over  (N/3)-l  points.  Eq 
(3.16)  can  be  rewritten  to  reflect  this: 
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(3.17) 


k  2k 

X{k)  -  r(k)  I  W.  cd:)  ^  W.,  11(1;) 

L  4  4  « 

Eq  (3.17)  can  be  imploir.onted  into  the  butlerfly  flov/qraph 

in  Figure  3.9  using  the  accepted  notational  conventions 

(Oppenheim  and  Schafer,  1975).  The  convention  used  for 

the  flowgraph  is  when  no  coefficient  is  sliown,  the  branch 

transmittance  is  assumed  to  be  one.  For  other  branches  the 

transmittance  (multiplier)  is  an  integer  power  multiplier 

of  Wj^.  In  Figure  3.9  there  are  three  N/3  point  DFTs  and 

these  are  computed  with  F(k)  designating  the  three  point 

DFT  of  the  3r  points,  G(k)  designating  the  three  point  DFT 

of  3r+l,  and  H(k)  designating  the  DFT  of  3r+2  points, 

where  r  =  0,  1,  ...,  (N/3)-l. 

X(0)  is  obtained  by  (1)  multiplying  H(0)  by  a  branch 

transmittance  of  1  (which  equals  W®) ,  (2)  multiplying 

G(0)  by  1,  (3)  multiplying  F(0)  by  1,  and  (4)  summing  the 

three.  Likewise,  X(l)  is  obtained  by  multiplying  H(l)  by 
2  1 

multiplying  G(l)  by  and  adding  the  results  to  F(l). 

X(6)  has  H(6)  multiplied  by  and  G(6)  multiplied  by 

and  the  products  added  to  F(6)  giving: 

X(6)  =  F(6)  +  G(6)  +  11(6)  (3.18) 

How'over,  since  F(k),  G(k),  and  H(k)  are  all  periodic  in 
k  with  period  N/3=3,  the  periodicity  can  be  exploited  to 
yield  F(G)  =  F(0),  G(6)  -  G(0),  and  11(6)  =  11(0).  These 
results  can  be  substituted  into  Eq  (3.18)  to  give: 

X(6)  =  F(0)  +  G(0)  +  11(0)  (3.19) 
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ConLiimiri'!  to  uno  tlu'  noriodic  i^ropcrtioc,  the 
results  fc;r  X(0)  Llirou'jh  X(8)  arc: 


X(0)  =  F(0)  +  G(0)  +  11(0)  (3.20) 

1  2 

X(l)  =  F(l)  +  Wg  G(l)  +  Wg  11(1)  (3.21) 

2  4 

X(2)  =  F(2)  +  Vitg  G(2)  +  Wg  H(2)  (3.22) 

3  6 

X(3)  =  F(0)  +  Wg  G(0)  +  Wg  H(0)  (3.23) 

4  8 

X(4)  =  F(l)  +  Wg  G(l)  +  Wg  H(l)  (3.24) 

5  10 

X(5)  =  F(2)  +  Wg  G(2)  4  Wg  H(2)  (3.25) 

6  12 

X(6)  =  F(0)  +  Wg  G(0)  4-  Wg  H(0)  (3.26) 

7  14 

X(7)  =  F(l)  +  Wg  G(l)  4-  Wg  H(l)  (3.27) 

8  16 

X(8)  =  F(2)  4  Wg  G(2)  4-  Wg  H(2)  (3.28) 


Eqs  (3.20)  throutjli  (3.28)  conclude  the  first  stage  decimation 
of  the  9-[joint  sequence.  The  DFT  computation  has  been 
reduced  to  computations  of  N/3-point  DFTs  where  N/3  =  3. 

An  e>:.i;,;  le  3-ioiiil  DPT  for  >:(0),  x{3),  and  x(6)  is  shown  in 
Fitiuri.’  5.10.  The  complete  flowgraph  for  the  computation  of 
the  9-j  ,iiiit  P!  I’  is  sliown  in  Figure  3.11  and  was  obtained  by 
sub;  t  1  t  ut  1  r;  !  the  computation  of  Figure  3.10  into  Figure  3.9. 

Con:  1  del  i  ng  t  lie  moia’  general  case  with  N  a  power  of  3 
gi'  :•  '  h  :!  iw''  tin  :-.anie  dt'cinntion  procedure  would  be 

eon;  I’ni.  i  by  co' -oiipos  i  ng  tiie  N/3  DFTr.  into  N/9  computations 
oi  .  ,  r.ik),  and  li(.k).  The  DFT  of  F()c)  is: 
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V'(k) 


(i.29) 


(N/3)-l  rk 

>■  x(r)  VJ 
r-n  •) 


This  cciviation,  Icttinn  Q  -  can  bo  clividod  into 

three  N/9  length  sequences: 


Q  3ik  Q  (3i+l)k 

i=.0  i=0 

Q  (3i+2)k 

+  5.  f  (  3i  +  2 )  V.'  ,-y 

i-n  N/3 


(3.30) 


Expanding  the  exponents  of  (3.30)  can  be  rewritten: 


3ik 


F(k)  =  i  f(3i)W 
i  =  0 


W. 


N/3  N/3 


Z  f(3i  +  l)W, 
i=0 


3ik 

N/3 


2k  Q  3ik 

”n/3 

^  3 


Using  the  substi  tiition  =  '^79' 

Q  ik  k  Q 

r 

i=0 


ik 


F(k)  =  i  f(3i)W^/g  t  ^f^f(3i+l)W^y^ 


(3.31) 


2k  Q 


ik 


^  V3 


(3.32) 


Similar  c/:pre5;s  j  ons  for  G(r;)  and  }!!r,':)  can  be  derived: 


ik 


ik 


r,(k)  -- 


g  (  3  3  )  w. ,  +  W,  ^  ^  a  (  3  J  +  1 )  ^ 


1-0 
2k  Q 


+  W 


N/3 


i  k 

q(3i  f2)W., 


I-  0 


/9 


(3.33) 


Q  ik  ! 

If(k)  -  h  (  3:  +  1 

•  ii*. 


i-n 


\/3 


Q  ik 

'  ^niM)w 

i-0 


2k  Q 


ik 


'^N/3  '•■’N/9 


(3.34) 
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(3.32)  ‘  )uO'JC]h  (3,34)  t-.i::  :k.'  u.'H'l!  t  .>  >  tJic 

cjont  i ( Ml  foi  a  r.ici.ix-B  buticrliv-  1 '1  owy j‘ai <h  . 

Lettirai  N  9  the-  c'>.nro:-;;'.i ons  for  l'(k),  G(k)  and  ll(k)  l.KCome: 

0  0 


F  ( 0 ) 

- 

f  (0) 

) 

^3 

f  (f  ) 

+ 

^3 

f  (2) 

1 

2 

F(l) 

= 

f  (0) 

+ 

W 

3 

f  (1) 

+ 

^3 

f  (2) 

2 

4 

F(2) 

= 

f  (0) 

«3 

£(1) 

+ 

^3 

f  (2) 

0 

0 

G(0) 

g(0) 

+ 

^3 

g(l) 

+ 

^3 

g(2) 

1 

2 

G(l) 

= 

g(0) 

+ 

W3 

g(l) 

+ 

^^3 

g(2) 

2 

4 

G(3) 

= 

g(0) 

+ 

^3 

g(l) 

+ 

W 

3 

g(2) 

0 

0 

H(0) 

= 

h(0) 

+ 

W3 

h(l) 

+ 

w 

3 

g(2) 

1 

2 

H(l) 

— 

h(0) 

+ 

^3 

h(l) 

+ 

W 

”3 

g(2) 

2 

4 

H(2) 

= 

h(0) 

+ 

”3 

h(l) 

+ 

W3 

g(2) 

From 

Eq 

[S  (3 

.  3 

5)  • 

through 

(3, 

.37) 

(3.35) 


(3.36) 


(3.37) 


mu.l  t  iii'i  icTS  arc  derived  (cc'nr.  i  si  en!  willi  0;m -e'lru- i  r’  sik! 
Schafer)  to  be: 

k  2k 


X(k)  =  F(k)  +  Wj^,  (;(k)  +  II  (k) 

k+r  2k^2r 

X(ktr)  =  F(k)  +  Wj^  G(k)  +  Wj^ 

k+2r  2ki4r 

X(k-i2r)  =---  F(k)  +  W  G(k)  4  1.', 


(3.38) 

(3.39) 

(3.40) 


whore  y  repre  r.i nt  the  distance  betwc'on  the  eiulpointr,  of 
th('  butterfly.  In  I- i  nui'e  3.11  r  )  for  stain^'  1  and  r  2  for 


sL, 2.  (2.  32)  Ihiouuli  (3.-10)  arc  n -jjrcsenl  od  in 

I'i-  uic  3.12  v.'liich  i;.  llu-  ai'i'crvil  radix--3  butterfly 
f  lowqr  aj  'h . 

Tlio  ex;  onents  of  figure  3.12  can  be-  rewritten  to: 


W 


k-4  r 


,,2k+2r  ,,2k  ,,2r 

W  -  I'J  w 


,,ki-2r  ,,k  ,,2r 

W  -  W  W 


,,2k44r  ,,2k 

W  =  W  V\’ 


(3.41) 

(3.42) 

(3.43) 


(3.44) 

With  these  expressions  for  the  butterfly  nultipliers  an 

alternative  arrangement  to  Figure  3.12  is  possible  by 

"premultiplyincj"  or  "twiddling"  the  inputs  to  G(k)  and 

H(k)  (Gentleman  and  Sande,  1966).  The  multipliers 
2k 

and  represent  the  twiddle  factors  of  the  butterfly 

in  Figure  3.13.  Since  N=3r  (Oppenheim  and  Schafer,  1975) 
the  butterfly  multipli-ors  can  be  reduced  to: 

exp  (-j27ir/3r)  =  exp  (-j2r/3)  (3.45) 

=  -0.5  -  j.8GC 


W. 


w.^’^'  -  exp  (-:i47i/3)  -  -0.5  +  j  .  866 


N 


N  3r 


'xp  (-jS!T/3) 


•0.5  -  ;i.8G6 


(3.46) 

(3.47) 


Opi'enheim  and  .fi'hafer  obrerved  that  there  is  no  advantage 
in  Figure  3.12  to  ti:c  iilternatc'  tv.’iddle  factor  version  in 
Firgurc'  3.13  because  "  ex;-' ( -  i  2  /3 )  I'.nd  all  the  po\ver.s  thereof 
are  c-c'i'j-dc'x  eee  f  f  ici  ('ll  t ;;  that  ri'guire  mul  t  i  jil  i  cations  "  . 
Howc’ver,  for  the  partic'ular  FOltTRAN  FFT  radix-3  programs 
v/hich  impler-  nted  Fi<;uii':-.  3.12  and  3.13,  the  twiddle  factor 
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m 


vorr.ion  of  the  radi>:~3  was  much  more  efficient  to 

iiiij^  Lenient  because  onJy  two  LwitLdit-  liicLors  liad  Lcj  ijc  computed 
K  2  ]" 

(W  and  W  ")  per  butterfly  and  the  butterfly  multipliers  were 
the  constants  in  Eqs  (3.45)  and  (3.46),  the  original  version 
of  Figure  3.12  requires  that  all  six  co/iiplex  multipliers  be 
computed  for  each  butterfly.  The  twiddle  factor  version 
represents  a  simplification  over  the  original  radi:;-3 
butterfly . 

3.2.3  Radix- 5  Theory .  The  theory  for  the  radix-5 
algorithm  follows  a  development  similar  to  the  radix-3. 

Because  of  this  similarity  only  the  radix-5  results  are 
given  here  for  comparison  to  the  radix-3,  readers  interested 
in  detailed  development  are  referred  to  Appendix  D. 

The  basic  butterfly  multipliers  for  the  radix-5  are 
given  by: 

k  2k  3k  4k 

X(k)  =  A(k)  +  B(k)  +  C(k)  +  D(k)  +  E(k)  (3.48) 


k+r  2k+2r  3k+3r 


X(k+r) 

=  A(k)  +  B(k)  + 

Wn  C(k)  + 

D(k) 

H- 

4kl-4r 

Wn  E(k) 

(3.43) 

X(k+2r) 

k+2r 

=  A(k)  +  B(k) 

2k  f  4r 

+  V/^  C(k)  -f 

3k4-Gr 

D(k) 

4k+8r 

+  E(k) 

(3.50) 

X (k+3r ) 

k+3r 

-  A(k)  +  B(k) 

2k+6r 

+  C(k)  + 

3k+9r 

D(k) 

4k+12r 

+  E(k) 

(3.51) 
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k+4r  2kl-8r  3k  +  12r 

X(k  +  4r)  -  A(k)  +  V.'  n(k)  +  V/.,  C(k)  -l  W.,  n(k) 

4k+16r 

+  I.(k)  (3.52) 

The  Eqs  (3.48)  throiiqh  (3.52)  arc  shown  in  the  twiddle 
factor  butterfly  of  Fiejure  3.34  where.-  "r"  is  the  distance 
between  the  butterfly  and  points.  Since  -Sr  the  butterfly 
multipliers  reduce  to  constant  conulex  :.'.ul  tipliers  of: 


r 

6r 

16r 

'n  = 

:  W 

N 

W^  =  cos(2;!/5)  -j 

si n (2 ' /5 ) 

2r 

12r 

'n 

=  W 

N 

=  cos(4ti/5)  -j  sin( 

47r/5) 

3r 

2r 

*  8r 

'n 

'  ' 

=  W^  =  cos(4  7t/5) 

+j  sin(47T/5) 

4r 

r  . 

9r 

'n 

=  '«N> 

=  Wj^  =  cos(2tt/5) 

+j  sin(27i/5) 

These  constant  butterfly  multipliers  are  computed  once 
during  the  FFT  computation  and  used  in  every  radix-5 
butterfly . 

3.2.4  Digit  Reversal  Algorithm.  In  order  for  the 

DFT  to  bo  computed  as  discussed  above,  the  input  data  must 

be  stored  in  nonsequential  order.  In  fact  the  order  in 
which  the  inpuf  data  arc  stored  is  in  "bit-reversed"  order 
for  the  radix-2  FFT  and  "digit-reversed"  order  for  the 
other  fixed-radix  algorithms.  To  see  what  is  meant  by  this 
torm,inology  note  that  for  the  S-xioint  radix-2  flowgraph  of 
Figure  3.8  three  binary  digits  arc  required  to  index  through 
the  data  array.  Writing  the  inp\it  indices  Xq  in  binary  form 

and  then  reversing  the  order  of  th'  .^s  gives: 
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Xo(0)  =  Xq(OOO)  =  x(OOO)  =  x(0) 

Xq(1)  =  Xq(OOI)  =  x(lOO)  =  x(4) 

Xq{2)  =  XQ(OiO)  =  x(OlO)  -  x(2) 

Xq(3)  =  Xq(011)  =  x(llO)  =  x(6)  (3.53) 

Xq(7)  =  Xq(111)  =  x{lll)  =  X(7) 

If  (n2  Hq)  is  the  binary  representation  of  the  index  of 
the  sequence  x(n) ,  then  sequence  value  s (n2  n^)  is  stored 

in  array  position  Xq  (n^  n^  in  determining  the 

position  of  x(n2  n^^  n^)  in  the  input  array,  the  bits  of 
index  n  must  be  reversed  in  order. 

For  the  radix-3  FFT  the  input  array  must  be  in  a 
similar  nonsequential  order.  The  order  is  determined  by 
"digit  reversing"  the  input  sequence  value  using  a  modulo-3 
counter.  The  digit  reversed  radix-3  FFT  example  where  N=9 
is  shown  in  Figure  3.15.  The  modulo- 3  counter  is  given  by: 

COUNT  -  •  3-^)  +  (bp  •  3°)  (3.54) 

where  bj^  =  0 ,  1,  2.  The  reversed  count  is  given  by: 

REVCOUNT  =  (bp  •  3^)  +  (b^  •  3°)  (3.55) 

Eqs  (3.54)  and  (3.55)  show  the  modulo-3  counter  for  N=9 
which  requires  on]y  two  b^,  bits:  b^  and  b^  to  represent  the 

.  3 

input  .sequence.  For  the  case  whore  N=3  =27  three  bits  are 
needed  to  represent  the  input  sequence  x(n)  and  the  modulo-3 
counter  becomes : 
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(3.56) 


courrr  (b.,  •  *  (b^  •  3’)  t  (i>^  • 


and  tJic  rav'  i  '  i;  di<;jt  cnainter  i';: 

REVCOUl.T  -  (bQ  •  3^)  +  (h^  •  3^)  +  (1)2  •  3°)  (3.57) 

Siiailarly  tlie  cjcnGral  exi:>rfs.s  i  on.;  for  COUE'I’  and  REVCOl'tlT 

in 

can  be  given  where  N- 3  '  and  bj^  -  0,  ],  2; 

COUNT  -  (b  ,  •  3^~^)  +  (b  T  +  ... 

m- 1  m- 2 

+  (b^  •  3^)  +  (bp  •  3°)  (3.58) 

and 


REVCOUNT  =  +  (b2  *  3"^  ^)  +  ... 

+  (b^  •  3^)  +  (b^  .  .  3°)  (3.59) 

m-  2  m- 1 

Once  COUNT  eind  REVCOUNT  are  computed  the  magnifudos  are 
compared.  If  REVCOUNT  is  less  than  or  equal  to  COUNT  a 
swap  of  the  values  indexed  by  COUNT  and  REVCOUNT  is  not 
required;  otherwise  exchange  the  array  value  indexed  in 
by  COUNT  with  the  array  value  indexed  by  REVCOUNT.  T’lC 
cojntcrs  are  incremented  by  one  and  the  process  continuej^. 
until  all  N  indices  have  been  tested. 

3.2.5  Dove]  opnion  t  of  a  Radi:;- 3  pn’  Ihi.e  .i  ( Idn' 


Cub'.'  c^f  bniA-y.  This  secti.on  presents  the  theoj* 

a  radl;-:-!  FFl’  algorithm  which  uses  the  comi-lex  cube 
unity  to  perform  the  c:omplcx  Fourier  t rans fenmat ion 
fly)  without  u;;ing  inul  i  ijrl  ications .  Tlie  benefit  of 
technique  will  also  bo  discussed  in  the  section  on  r 
operations  count.  ^ 


y  of 
root  o.f, 
(|||P  ttcr- 
Llii  s 


cal 


/ 


V\hiK.-  t  iu  1  I  1  oi  •  (liulioi;;  .iiul  Vfiic'l  :'.anopoul(3S  , 


I''/-)  i  ;  ;•  1  i  ;  t  1  ( >1)  of  U)j;;  Lecliijique,  it 

1(  ivcr.  out  sovi  iul  which  aid  in  undc'rstandinq  the 

thc-ury  and  for  that  reason  it  is  presented  again  hero. 

This  aigorithni  uses  basis  vectors  (l,u)  instead  of  the 
conventional  complex  plane  vectors  (l,j)  to  perform  the 
complex  Fourier  transform  (whore  u  is  tlio  cube  root  of  1 
and  j  is  the  square  root  of  -1).  The  new  basis  vectors 
use  arithmetic  notation: 

a  +  bu  -  R(u)  ;  a,  b,  real  numbers  (3.60) 

Taking  u  as  the  cube  root  of  1  implies: 

u^  -  1  =  0  (3.61) 

or 

(u-1) (u^  +  u  +  1)  =0  (3.62) 

Since  it  is  known  u  1,  then 

u^  +  u  +  1  =  0  (3.63) 

or 

u^  =  -1  -  u  (3.64) 

Eq  1 3. GO)  is  used  in  the  definition  of  multiplication  in 
the  h(u)  field; 

2 

(a  +  bu)  (c  +  du)  --  ac  +  bdu  +  adu  +  bcu  (3.65) 

Sul:)r,t ituting  Eq  (3.64)  into  Eq  (3.65)  results  in; 

(a  +  bu) (c  +  du)  -  (ac  -  bd)  +  (ad  +  b(c-d))u  (3.66) 

The  expression  in  Eq  (3. 66)  can  ho  expanded  and  then 
r<’co;:lii  ned  to  reduce  the  number  of  multiplications: 
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ad  1^  Ij(c-d)  -  ad  I  be  -  bd  - 
=  ac  +  ad  +  be  + 
=  (a  +  b)  (e  +  d) 
Substitut'.inq  Eq  (3.69)  into  Eq 


bd  t- 

)jd  + 

ac  -  ac 

(3, 

.67) 

bd  - 

ac  - 

bd  -  bd 

(3. 

.68) 

-  ac 

-  bd 

-  bd 

(3. 

.  69) 

(3.6G)  gives: 


(a  +  bu) (e  +  du)  =  (ae  -  bd)  (3.70) 

+  ( (a  +  b) (e  +  b) -  ae  -  bd  -  bd))u 

The  result  in  Eq  (3.70)  requires  three  real  multiplications 

and  six  real  additions  compared  witli  conventional  complex 

multiplication  which  requires  four  real  multiplications  and 

two  real  additions.  Multiplication  in  the  R(u)  field  requires 

one  less  multiplication  but  four  more  additions. 

3  3 

The  expression  for  u  is  obtained  from  u  =  1  by  lotting 
3  3 

u  =  (exp  (-j2Tr/3)  )  =  1.  Consequently,  u  =  exp  (-j  2tt/3)  = 

-1/2  -j  (/3/2)  which  is  used  for  conversion  bot\s'oon  a  +  bj 
and  c  +  du: 


c  +  du  =  c  +  d(-l/2-j (/3/2) )  =  c  -  d/2-j(v^/2)d 


0^ 


f 


c  +  du  (c  -  d/2)  +  j(-»^/2)d 
To  find  the  conversion  from  a  +  bj  to  c  +  du ,  soJve 
Eq  (3.70)  for  j : 


c  +  du  ==  (c  -  d/2)  (•  (->'>d/2)  j 

d/2  +  du  =  (-/3/7)d  j 
d(V2  +  u)  =  (-v/i/2)d  j 


1/2  I  u  -  (-,/3/2)j 
j  --  (-2//3)  (1/2  +  u) 


(3.73) 


U.siiHi  r.q  (3.f'G)  .uiil  a  +  bj  l_hc  convorr;jon  to  c  +  du  is: 

I  b  j  -  ii  >  l'(-2/'i  .<)  (1/2  I-  u) 

=  a  t-  b(-2//3)  (.1/2)  +  b(-2/*'^3)u 

a  +  bj  -  (a  -  b//3)  +  (-2b//3)u  (3.74) 

Using  the  R(u)  arithmetic  devt-loped  above,  it  can  bo 
shown  that  a  radix-3  FFT  butterfly  can  be  developed  Vv'hich 
requires  no  multiplications  except  for  the  twiddle  factors 
in  Figure  3.13. 

Using  Eq  (3.74)  and  =  cos(2iir/N)  +  j  (-sin  (2’ r/N)  ) 
produces : 

c  +  du  =  (cos  ( 27Tr/N)  +  sin  (2iTr/N) //3) 

+  (2  sin  (2Tir/N) //3)  u  (3.75) 

Using  the  substitution  of  N  =  3r  in  Eq  (3.75)  reduces  it  to 

=  (cos(2tt/3)  +  sin  ( 2ii/3) /3)  +  2  sin  (2tt/3)  >^)  u 

wj  =  0  +  lu  =  u  (3.76) 

Likewise  the  rom^iining  W  terms  in  Figure  3.7  can  be  reduced 

=  (cos(4t/3)  +  sin  (4  "/3) //'S)  +  2  sin  (47i/3) />^  )  u 

=  -1  -  ]  u  (3.77) 

-  0  +  1  u  -  u  (3.78) 

Substituting  L’qs  (3.7G)  llirough  (3.78)  into  Figure  3.13 
produces  Figure  3,36. 
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Mul  t  j  p  1  j  f  .D'o  roni! i rt'd  to  uaLo  t.ln'  but  t  c' f  f  .1 1‘ j  ov/uraph  . 

butl.orfly  inputs  iiftei  twiddle-  f;u;L(jr  ]::u]  li- 
plicatioii  and  A  (' )  ,  B(*)  aie  the  luit  t  or  i  1 outiHits  in  llu- 
R ( u )  field. 

A(l)  +  13(l)u  -  {XI  +  X2  +  X3)  +  (Y1  +  Y2  +  Y3)u  (3.79) 

A(2)  +  B(2)u  -  (X2  +  Y2u) (0  +  u)  +  (X3  +  Y3u) (-1  -  u) 

+  (Xl  +  Ylu) 

A  (2)  +  r.  (2)u  =  (-Y2)  +  (X2  +  Y2  (-l))u  +  (-X3  +  Y3) 

+  (-X3)u  -  XI  +  Ylu  (3.80) 

=  (XI  -  Y2  -  X3  +  Y3)  +  (Y1  +  X2  -  Y2  -  X3)u 

A(3)  +  D(3)u  =  XI  +  Ylu  +  (X2  +  Y2u) (-1  -  u) 

+  (X3  +  Y3u) (0  +  u) 

=  XI  +  Ylu  +  (-X2  +  Y2)  +  (-X2)u  +  (-Y3) 

+  (X3  +  Y3 (Ql) ) u  (3.81) 

=  (XI  -  X2  +  Y2  -  Y3)  +  (Y1  -  X2  +  X3  -  Y3)u 

There  are  16  real  adclltions  shov:;;  in  Eqs  (3.80)  and 
(3.  Cl);  hf'wcver,  by  eonibiniiu;  eonuv.on  terms  -Y2  -  X3  -  -R 
and  -X2  -  Y3  “  -.S,  Liio  radix--!  buLl.erlly  can  bo  oval'oaLed 
using  only  fourLecn  real  additions  (neglecti  ncj  the-'  tv.’iddlc 
factor.s)  : 

A(l)  -  Xl  -t-  X2  -t-  X3 

B(l)  ==  Yl  -t-  Y2  -t  Y3 

A{2)  -  Xl  +  Y3  -  R 

wliere  R  =  Y2  -f  X3 

30 


B(2)  =  Yl  -t  X2  -  R 


A(-n 


Ni  +  y2 


H(j)  ^  'l  l  x3  -  v.i.LTc  s  -  X2  (  yj 

3 . 2  .  C>  SuiTiinary .  Thi;;.  cci  i;)]  ctef:  tin-  A  iscuf^siun  of 
fixed  radi;:  ri”]'  thoory.  In  thin  necLion  tlu'  (jciiei'al  theory 
was  dcvcrlopca.:  usina  Llie  radix-3  case  as  an  alternative  to 
the  more  cor’inon  radix-3  developjr.ent.  A  dc'cir.iation-in-time 
for  N"9  was  shown  I'md  tlic  biisic  buLtcrflv  ec]uations  for 
radix-3  was  derived.  P.ecause  of  the  simil^irity  to  radix-3 
butterf  1  ieJi ,  the  radix- ^  theory  was  not  developed  bat  the 
butterfly  equations  necessary  to  implement  a  radix-5  FFT 
was  given.  Finally,  a  new  radix- 3  FFT  (Dubois  and 
Venotsanopoulos ,  1978)  was  developed. 

3.3  Real  Operations  Cemnt  for  Fi.xed  Radi  x  FFTs 

The  speed  at  which  an  FFT  algorithm  can  perform  the 

DFT  is  a  (to  a  first  approximation)  proportional  to  the 

numiber  of  complc.x  multiplications  used  in  the  algorithm 

(Singleton,  1969).  The  number  of  times  the  data  array  is 

indexed  is  a  secondary  factor  and  is  sliown  to  have  minimal 

impact  on  the  results  of  this  paper. 

An  ^incrnalN'  in  tlie  nemon'd  at  are  shoula  In':  p-oinli  d  out 

before  further  discussion  of  "complc.x  multiplications" 

related  to  Fl''l's ,  A  complex  multiplication  implies  four 

real  multiplications  and  two  real  iidditions.  It  has  been 

2 

sliovai  ( .Simi  I  c Lon  ,  1969)  tliat  (p-1  )  real  mul t i  pi  ica  ( ci ons 

arc  required  to  evaluate  a  complex  transform  of  dimension 

m  2 

p,  p  odd,  where  N-p’  .  .Singleton  tlicn  rofeu's  to  the  (p-l)'^ 
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I 


1 


.2 

rcM.l  ini’, ].  1.  i.ijlicaLi c.in;;  as  (p"^  )  complex  multi  iel  ieaLions 

voiicit  a  iiiaal  i.a;  ;.l  cc-;;-. .  ii  i  :  i  iic>-'  a  rax::; :  1  ljX  Ljxiiis- 

2 

form  of  cl:i  I'lon'j j-cjii  }>  rccp.rircs  more  fhaii  (p-1)  ' /7.  real  CKiditioras . 
Throughout  this  j:>a]'>cr  all  references  to  multipl ications  and 
additions  art'  in  terms  of  real  operations  and  not  complex 
operations . 

The  real,  operations  are  determined  from  (1)  the  nujnber 
of  butterflies  times  the  number  of  real  operations  required 
to  compute  the  butic'rfly  and  (2)  the  number  of  tvriddle 
factors  times  real  operatioiis  required  per  twiddle  factor, 
and  (3)  the  number  of  trigonomecric  functions  (sine  and 
cosine)  whj.ch  must  be  computed.  The  real  operations  count 
for  a  radix-p  FFTs  are  derived  as  a  function  of  N,  m,  and 
p  where  N=p™. 

3.3.1  Number  of  Butterflies  in  Fixed  Radix-p  FFTs . 

The  nundjcr  of  butte.r-^lics  is  dependent  on  N,  m,  and  p, 
where  N=p’^.  Examining  the  radix-2  FFT  in  Figure  3-8  shows 
tliat  there  arc  8  input  points  and  8  output  points  for  each 
.slap’c.  The  ra(li::-2  butterfly  in  Ficniro  3.7  has  2  inp-)ut 
and  2  out]  at'  i'cviiits.  v.lrich  means  tliat  Fiepare  3.8  must  liavo 
8/2  -  •]  ivai.  Li.’ r  L  i  .i  es,  V'tm'  stagv'.  Tiioro  iirc  3  stages  in  tl'.  i.s 
radix-2  I’j'T  (wliorc  N--2^)  giving  a  total  of  12  butterflies 
in  this  ITT. 

In  gi'iii.'r.il  the  number  of  I’adix-p  biit terf  1  ies  is  given 
)iy :  mn/p  (3.82) 

This  equation  can  be  checked  for  the  radix-3  example. 

(liven  that  N  9,  p  3,  and  m -2  Nq  (3.82)  gives  tlic  total 


!i2 


nu,:.bcr  of  l.ni I  i  i.  r  1  .1  i.t';;  .ii-'.  2  •  9//5  -  6.  i;.;  vcriricd 

iiy  I'iijuj''  .  I  1  v.’liitbi  li.ii'.  G  rcidi;';-J  LulGcj'i  1  i  o;'. . 

3.3.d  Nuniljcr  ci  'ivixlt;  1  o  ractor?;  in  I'i.'-iL'u  IdKii.;;-]) 

I'l'Ts.  TliL'  l  v.''Lcldlc  iaclfirs  arc  c;or,',;''l  ox  invi  I  I  i  pi  i of  the 
lorm  exi)  ( -  j  2  )  v.’liioh  i:,ulLi])ly  cxich  raclix-j)  IjiitlL'rllv 

t'ls  oliowii  in  riejure  3.8.  Notice  tlurt  each  r.tagc  ha;'  N/p  = 

8/2  =  4  buttertlico,  Cuich.  of  v;hich  rcquircG  p-1  -  2-1  -  1 
complex  twiddle  factor.  The  general  cxi'ire.ssion  for  number 
of  twiddle  factors  in  each  stage  becomes; 

N(p-l)/p  (3.84) 

Given  that  N=p^  there  are  m  stages  in  a  radix-p  FFT  making 

the  total  number  of  twiddle  factors  for  the  FFT  equal: 

mN(p-l)/p  (3.85) 

Some  of  the  comiolex  twiddle  factors  are  W.^  =  1  and  can  be 

eliminated.  In  any  FFT  there  are  N-1  of  these  unity  twiddle 

factors  (Singleton,  19G9)  which  gives  the  final  expression 

for  the  number  of  comt''lt'x  tv;iddlo  factors  as: 

mN(p-l)/p  -  (N-1)  (3.86) 

Ur,:in()  N  =  p*^‘  =  2^  -  8  in  li]  (3. 80)  flic  nui:,bor  of  Iwiddlo 

e 

factors  is  l.oin.d  to  bo  5.  bxamiinng  Figui  o  3.8  for  N  -2' 

5;h(.)W.s  th<,'rc  a  i.  c;  5  non-uni.  l.y  twidtile  L.ictoa's. 

3.3.3  Kuixljor  of  T.ri  oonoTac'tri.c  Functions  Kociuirod 
for  1  he  Fi  ::o;1  Uai,iix  7\1  c:. withms .  The  t  riuoncmictric  functions 
of  sine  ..uu!  cosine  are  iiiwuctl  to  compvitk'  tlu-  twiddle  factoi's. 
TIk'  f.ixed  I  ad  i::-2  alcioi;  i  thin  usc.s  c.ills  to  the  ITII'TRAN 
lil.'rary  F.li;  and  COS  fuiwlivoiu;  as  W'oll  as  the  diffei'encc 
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cqu.  1 1.  i  I  )n;;  tjiv.,  ;i  j.ii  St.ct.icMi  .T  .  ^  .  Tlio  i:a(Li::-3  and  5  I’FTfi 
u:  (.  ■  ( i:  I  I  y  ::  i  ii'  . ;  ;  i  ;  :  !  f.  ■  j  a  aai.'  ^  qua  1-  <  a  la  . 

The  rad-ix-.l’  cilqta-.i  Lhiii  in  /'q'qK-ndix  h  computes  one  sine 
and  cosine  at  each  staqo  of  the  FFT  usiaig: 

V]  ^  enPLX  (cos  (Pi/j.Fi) ,  SIN  (p;r/i,i;i) ) 

Each  radi>:-2  F!''i'  lias  m  stages  whei'C  M'=2™  v.’hich  means  the 
sine  and  cosine  functions  are  called  m  tinie.s  for  the  FFT. 
Once  the  initial  sine  and  cosine  are  computed  for  the 
stage  each  new  tv;iddlo  factor  in  tlie  stage  is  computed 
using  the  complex  multiplication: 

U  =  U  *  W 

where  the  complex  U  v;as  originally  initialized  to  U  =  (1,0). 
The  complex  multiplication  U  *  W  effectively  implements 
the  sine  and  cosine  difference  equations  in  Section  3.1. 

The  number  of  times  U  *  W  is  com.putod  for  each  FFT  stage 
is  a  function  of  the  number  of  different  twiddle  factors  in 
the  stage  m^ .  In  Figure  3.8  the  first  stage  has  only  one 
type  of  twiddle  factor  VJ^,  the  second  stage  has  tv/o  types: 

<ind  ,  win  i  c  stncio  lias  four:  W  P, ,  .  The 

general  exprc.'ssion  for  1  lu'  typos  of  twiddle  factors  in 
eacli  stage  is: 

TF  -  2'^ 

Tlius  for  stage  ],  k=l  and  TF"2^--1,  which  gives  one  typo 

] 

of  twiddle  factor;  for  si  acre  2,  la'2  and  TF=  2  -=2  giving  two 

types  of  twiddle  factors;  and  finally  for  tlie  last  stage 

2 

in  this,  examjilo  k~3  and  Tr--2  =4,  or  four  tyj^os  of  twiddle 
factors  arc  retjuired.  In  genera]  for  the  radix-2  Fi'T  in 


3-1 


A'.  f  ic^ndix  A  t  he  ecjnifilc  x  inulti;).!  i  cation  h" 


IV  ic  evaJ. iiatcd 


.1  tadail  o[ 

k=l 

times,  vs’hero  m  is  the  number  of  stiigcs  for  M=2^\  Given 
that  the  coi’iplex  multiplications  requires  4  real  multipli¬ 
cations  and  2  additions,  the  number  of  operations  required 
to  compute  sines  and  cosines  for  this  radix-2  FFT  is: 

^  k-1 

real  mult  =4  1  (2^ 

k-1 

k-1 

real  add  =2  E  (2 

k=l 

sine  and  cosine  calls  =  m  (3.89) 

ae  real  operations  required  to  compute  the  sine  and 
cosine  lookup  tables  for  the  radix-3  cind  5  algorithms  is 
less  complex  than  tlie  radix-2  FFT.  In  tJiosc  algorithms 
the  difference  equation  from  Section  3.1  is  used  to  compute 
sine  and  cosine  looku’o  tables  v/hicli  have  lengtli  F.  Because 
of  the  synmiet.ry  of  sin(k)  ~  -sin  (-JO  only  K/2  computations 
of  tlic  difference  ('qn.itions  arc  required.  The  cejuations 
are  given  by: 

WKC(I)  -  C  *  WKC(T-l)  -  S  *  WKS(I-l)  +  WKC(I-l) 

WKS(I)  -  C  *  WKS(I-l)  +  S  *  WKC(I-l)  +  WKS(I-l) 
viri  ch  need  a  total  of  4  real  multiplications  and  10  addition 
to  coiiqiuto.  J’or  iin  K  length  sequence  comjniting  the  lookup 
tablets  rctpiiro: 


)  (3.87) 

)  (3.88) 


ri'. i  1  i;:ul  t 


•1(N/2)  -  2N 


(3.90) 


jxvil  add  -  ]0{N/2)  -  (iN 

3.3.4  Number  of  Opera i  i ons  j_n  IN'id ix-;j>  FJ-''J'.s  . 

Bared  on  the  C|Oneral  expressions  in  Eqs  (3.82)  throuejh 
(3.9.1)  tlic  total  number  of  real  multipl  i  cations  can  bo 
determined  ylvcn  wiu'rc  N,  p,  and  m  are  integers. 

First,  each  radix-p  butterfly  computation  requires  multi¬ 
plications  or  additions  or  both  to  bo  evaluated.  The 
exact  nundDcr  of  multiplies  and  adds  is  determined  from  the 
FOltritlN  code  as  shown  below.  Second,  each  complc.x  twiddle 
factor  multip-'lication  requires  4  real  multiplications  and 
2  real  additions .  Third,  the  number  of  real  operations  to 
compute  the  sines  and  cosines  is  added  to  the  butterflies 
and  twiddle  factors  to  give  the  total  opei'ations  count  for 
each  algorithm. 

For  the  case  of  N--2’^  it  was  shown  in  the  radix-2 
Section  3.2.1  that  the  r£idix-2  butterfly  can  bo  computed  with 
4  real  additions  and  no  multiplications.  This  radix-2  butter¬ 
fly  ca.n  be  compuled  witli  4  real  additions  and  no  multiplica- 
tif'in;;.  Thi.*'.  radix-2  FFT  does  not.  eliminate  all  mulliplica- 
tiens  by  Wp, .  Tlierefore  each  radix-2  butterfly  is  multiplied 
by  a  complex  tv.’iddle  factor  <is  shown  in  Figure  3.8.  For  this 
pai'Liciilar  radix-2  FFT  1  lie  number  of  twiddle  factors  equal 
till'  mimlu’r  of  bu  t  t  ('r  f  Fi  ex  .  Corirlning  all  sources  of  real 
oju  i.d  ions  for  llu'  radjx-2  l''FT  givx'S  a  total  of: 
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real  iiiLilt 


( i!  Iniilcr  flics) 


( •■  luuK  jH  r  butterfly)  * 
r  •]  C  factoi:..)  (3.92) 

+  4  (1:  typer,  ol  twiddle  factors) 

Substituting  the  ajjpropriatc  values  for  the  r,idi:;-2  givc.r: 

real  mult  -  (0)  *  (mN/2)  t  i*(rali/2)  -l  4  *  (  2  2^'  ■*■ ) 


m 

=  2mN  +  4  V 
)c=l 


.k-1 


k-1 


(3.93) 


Likev;ise  for  the  nui.ibcr  of  real  addiiions: 

real  adds  =  (j;  adds  per  buttex'fly)  *  (If  butterflies) 


real  adds 


+  2  (f  twiddle  factors) 

+2  (#  types  of  twiddle  factors) 

y-i 

4  *  (mN/2)  +  2*  (mN/2)  -i  2  *  i  T.  2^' 


m 

3mN  +  2  T. 

k=l 


,k-l 


k=l 


(3.94) 


(3.95) 


For  the  radix-p  I’FTs  where  p  is  an  odd  prime  it  has 

been  shov.’n  by  Singleton,  1969,  that  these  butterflies  can 

2 

be  evaluated  using  (p-1)  real  multiplications.  The 
FORTRAN  coded  radix-3  and  radix-5  in  Ap!:>cr.di cos  13  and  D 
require  4  real  multi  j^lications  and  12  atiditions.  for  radix-3 
buLterflier  and  IG  real  mul  tij.d  icatic'!:,r  and  30  adojtions 
for  radix-2  butterflies.  Using  these  in  Fqs  (3.87)  and 
(3.91)  yields  the  total  real  operations  for  the  radix-3  as: 
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real  luul  L  -  (•!  inult  per  buliLcrfly)  *  i:iN/3 
+  4  (mb' (3-l)/3  -  (N-1))  +  2N 
=  4mN/3  +  8niN/3  -  4  (N-1)  +  2N 

=  4nN  -  4  (N-1)  -I-  2N  (3.96) 

real  adds  =  (12  adds  per  butterfly)  *  iuN/3 
+  2(niN(3-l)/3  -  (N-1))  +  5N 
=  12mN/3  +  4mN/3  -  2 (N-1)  +  5N 
=  16imV3  -  2  (N-1)  +  5N  (3.97) 

Similarly  the  real  operations  count  for  the  radix-5  FFT 
becomes : 

real  mult  =  (16  mult  per  butterfly)  *  mN/5 
+  4(mN(5-l)/5  -  (N-1))  +  2N 
=  16mN/5  +  16mN/5  -  4{N-1)  +  2N 
=  32mN/5  -  4  (N-1)  +  2N  (3.98) 

real  adds  =  (30  adds  per  butterfly)  *  mN/5 
+  2(mN(5-l)/5  -  (N-D)  +  5N 
=  30mN/5  +  8mN/5  -  2 (N-1)  +  5N 

=  38mN/5  -  2  (N-1)  +  5N  (3.99) 

The  results  of  Eqs  (3.92)  throuqh  (3.99)  are  given  in  Table 
3.1  for  N  liel'wecn  8  and  1.6,000.  This  tal.il  e  also  summarizes 
the  possible  values  of  N  for  the  fixed  radix-2,  3,  and  5 
FFTs . 

3.3.5  Real  Opora^tio)'.s  ^unt  for  the  Ra^ix-3  FFT 
Uiiing  tin’  Ctimnlc'x  ('’iibe  Root  of  Unitv.  'I'his  algorithm 
represents  an  alternative  to  the  conventional  radix-3  FFT. 
It  is  shown  in  this  section  that  selective  use  of  this 
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TABLi :  3.1  , 

I'JiiAi.  opi'iivi  b/ 


N 

Radix 

MultipliCc-itions 

'Add-i  0  icui:-. 

Tri>!  I. 

8 

.  23 

76 

86 

3 

9 

3^ 

58 

125 

1 

16 

2^ 

188 

222 

4 

25 

5^ 

274 

457 

1 

27 

33 

274 

515 

1 

32 

2^ 

444 

542 

5 

64 

2^ 

1020 

1278 

6 

81 

3^^ 

1138 

1973 

1 

125 

53 

2154 

3227 

1 

128 

2^ 

2300 

2942 

7 

243 

3^ 

4378 

7211 

1 

256 

2« 

5116 

6654 

8 

512 

2^ 

11260 

14846 

9 

625 

5^ 

14754 

20877 

1 

729 

3^ 

15142 

25517 

1 

1024 

2IO 

24572 

32766 

10 

2048 

2^' 

53244 

71678 

11 

2187 

33 

56866 

8  8  211 

1 

3125 

5^ 

93754 

1 28127 

1 

4096 

2^2 

1  ]  4  6  c  ; 

1 55640 

12 

6  501 

3« 

190 H3 4 

2<'9fd'  1 

1 

8192 

23  3 

24  57'-(> 

3  3  5;;  70 

13 

150  2  5 

5^’ 

508754 

7  5'.  vn 

1 

iilfjO-i'j  '-hni  c.ai  jcihici'  I  h-,-  nu'.ul^i?r  ol  rc'al  oiK'rat ions  clcpcnilin') 


on  niu'  soquoncn.'  ]oa>i‘h  . 

The  radin-3  ri'T  in  the  R(u)  field  has  four  .'jourccs 
or  real  mul  tijDl  icntions  (where  N- 3'")  : 

1.  2iiin/3  -  (N-1)  complex  twiddle  factors  d.orivc.-d 
in  Section  3.3.3. 

m  i-1 

2.  Conv(,'rsion  from  complex  to  R(u)  of  ..2(3-1) 

i  =  2 

twiddlf  factors  derived  from  FORTRAN  code  in 
Appendix  C. 

3.  Con\'C'rsion  of  cor>iplex  aj'ra.y  of  lenntli  p  to  the 
R(u)  field  derived  from  the  FORTRAN  code. 

4.  Conversion  of  R(u)  array  length  N  back  to  the 
complex  field  derived  from  the  FORTRAN  code. 

The  radi:c-3  in  R(u)  lia.s  five  sources  of  real  additions: 

1.  mn/3  butterflies  derived  in  Section  3.3.3. 

2.  The  four  sources  of  real  multiplies  listed  above. 
Based  on  the  FORTR/iN  code  in  Appendix  C,  there  are  three 
real  multiplications  per  complex  tV'/iddle  factor,  two  per 
twiddle  factor  conversion,  two  per  conversion  from  complex 
to  the  R(u)  field,  and  two  per  conversion  from  R(u)  to  the 
complex  field.  Condensincj  the  eibovc  into  an  equation  for 
real  mul  tipi  .km  t  ions  yic.lds: 


m  i-1 

real  mult  -  3(2mN/3  -  N+1)  +2  r  2(3  -  1)  +  4N  (3.100) 

i=2 

There  arc  14  real  auditions  per  butterfly,  six  per 
twiddle  factor,  one  jkv  twiddle  factor  conversion,  one  per 
conversion  to  ]\(u)  array,  and  one  per  conversion  to  complex 
array.  Fxju'c  ss. ing  the  total  numljcr  of  real  additions  as  a 
function  of  tlu?  <>l)Ove  yields: 


GO 


-7 


real  acl'j;;  ^  ]A  iaU/'t  '  G{2;'i:\'/3  ~  N^.l.) 

li'.  .L  -  L 

■!  >:  2(3  -  1)  +  2N  (3.101) 

i-2 

The  roruilts  for  the  n  untie  r  of  real  multiplications  and 
additicjii:;  for  both  radix-3  alqorithms  is  (jivon  in  Table  3.2 
for  K=27  to  N  ].9(i<f3.  Itecaur.e  the  U(u)  reulix-l  requires  r.ore 
mul  tipi  i  c-a  t  ions  anci  additions  for  N-27  and  81  it  will  always 
run  slower  than  tt^.e  ofuxplex  field  radix-3  FFT.  But,  for 
N--2'13  and  h.iijhcr  the  K(u)  r<.idix-3  may  run  faster  dopendina 
upon  the  speed  of  additions  relative  to  multipl ications  for 
the  computer  being  used  to  perform  the  FFTs. 

Tabic  3.2  also  gives  the  "Add  to  Multiply  Ratio" 
required  for  the  R(u)  field  radix-3  FFT  to  run  faster  than 
the  conventional  radix-3  FFT.  (The  ratio  is  the  difference 
in  the  number  of  multiplies  divided  by  tlie  dif fi'rencc  in 
the  number  of  additions.)  For  the  case  of  h’=729,  a  multiply 
operation  must  take  3.77  times  longer  than  an  addition 
before  the'  R(u)  field  radix-3  can  run  fasten  than  the  err- 
plex  field  radix-3.  'lii.is  mean.s  that  pri(ir  to  .si'l.-e:  in- 
either  of  the  algorithms  the  relative  co.-d  of  addilb.ii,s 
to  multipl  i  cnations  must  be  known  ^^s  well  as  t'ne  length  ef 
the  data  si-guence. 

3.3.6  Memory  T^r'CTuircmonl  s  foj'  21^  Radi ::  Fl’ld.  A 
major  consideration  Id'C  sclectinii  a  particular  FFl' 
algorithm  is  tl^o  sequence  lengt.li  and  menv-iry  rcquircwl  to 
execute  the  suI)routine  rclativ.'  to  tlio  memory  available 
in  tlic  coi-puler.  For  this  roa.son  the  memory  requirements 
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co>’r>AP.iS(^;j  iM-:Tvn'FN  cor-vi.rx  and  r(vi) 
RAnix-3  I'FT  ]'OU  Ri'AL  oi'FRA’!’;i o:;.s* 


Complex  Radix-3  R(u)  R'ulix-3  Add  to 


N 

RoiiJ  Mult 

Real  Adds 

Real  Mull 

R.e/iJ  Acids 

Mul t  Rat 

2  7 

220 

380 

232 

624 

NA 

81 

976 

1568 

1284 

2562 

NA 

243 

3892 

5996 

3140 

9796 

5.05 

729 

1  4  584 

2IB72 

10912 

35714 

3.77 

2187 

524  92 

77276 

37152 

126108 

3.16 

6561 

183712 

266816 

124628 

435202 

2.85 

19683 

629360 

905420 

413308 

1476212 

2.63 

* 

Does  not 

include  computing  sine 

and  cosine 

terms 
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for  the  radix-?,  3,  and  5  FI-'"!':;  if;  divt.-n  lii'rt'  a;;  i  function 
of  sequence  i(.;ncjth  N.  Tlie  profirain  vk’Ihui  y  and  data  array 
storage  requireincnts  for  each  algorithm  are  enuniera, ted 
below. 

The  program  memorj  required  by  eacd;  routiiie  v.-ar; 
determined  from  a  "load  map"  generated  by  the  comidand  MAI’, 
PART.  The  array  storage  requirements  xjcrc  determined  bv 
inspection  of  the  DIMIiNSION  statements  in  the  PORTPAN  code 
for  each  subroutine  listed  in  Appendix  A  to  D.  The 
results  are: 

FFT  Program  Arrays 

Radix-2  108  2N 

Radix-3  301  4N  +  M  +  30 

Radix-3  in  R(ia)  396  4N  +  M  +  30 

Radix- 5  458  4N  +  M  +  30 

The  memory  arrays  required  for  each  algoritlim  as  a 
function  of  N  arc  listed  in  Table  3.3.  The  program  memory 
was  not  included  becaiisc  it  is  dependent  on  machine  word 
size  which  varies  from  machine  to  machine. 

3.4  Mixed  Ibic^x  FFT  A1  aori thm-; 

Up  to  this  point  only  fixed  radix  FFTs  ha  ucen 
discussed.  Fxplanation  and  programming  f..  the  special 
cases  where  N-2^  or  or  5^  arc  simpler  than  the  general 
case  of  N=p^P2  .  .  .  Pj^^/  and  for  most  applications  the  restricted 
choice  of  values  is  adccjuate.  However,  v.'hcn  the  aiiplication 
docfs  not  permit  "zeropacking"  of  the  data  sequence  to  reach 
one  of  the  special  cases  a  wider  choice  of  N  is  needed. 
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riXl'.I)  l^M)-TX  fX:MORY  IX.'.XilXXR 


N 

Memory  Arra 

8 

]G 

9 

68 

IG 

32 

25 

132 

27 

141 

32 

64 

64 

128 

81 

358 

125 

533 

128 

256 

243 

1007 

25G 

512 

512 

1024 

6  25 

2534 

7  29 

2952 

:  0  24 

20  4  8 

2048 

4095 

2187 

8820 

(,A 


n  i.iu! !  I '  I  ( ■!!  1  irr.t  piil)  I.  i  r.liod  a  ii'.ixod  I'atii:-;  i’l'l’  a  1  (jor  j  Ihiii 
iii  linin'  IfJnn  wh.ii’h  lu.;:  !njcn  w.icJc’J  y  uat-'d  anti  ii'ij.d  aincn'Lt'd  on 
lanjo  and  nnall  coin; -,n  I  cr  s  .  Thin  alyor  a  tluii  in  li.nttd  in 
A’,  >;  >!  ndi;-;  1'.  (The  Ini.ornati  ona3.  y.tifclior.atical  Scicnii.fic 
Library  (  I  MSI,)  v;hich  in  avaiJ.ablo  on  the  KPALB  C!)C  Cyber  74 
co'r.piiLei'  lian  a  niixc'd  radix  ]"FT  based  on  Singleton's  v;ork). 
jAlno  the  author  has  v.’ritten  tind  t.ostod  a  mixed  radj.x  algorit 
\’hiGh  is  listed  in  Appendix  E.  The  theory,  di.cjit  reversal, 
jroa],  ont'i'ti  i  i  ons'  cour.t ,  t:nd  memory  recari  remen  Ls  for  these 
algorithms  is  discussed  in  the  fol.lcv.’ing  sections. 

3.4.1  Mixed  Rad  ix  Theory .  All  EFT  tlicory  can  be 
developed  by  represent:  inti  a  one-dimonsion;il  sequence  N  as 
several  lv;o  dl  monsi  ona  1  natrices  and  performing  operations 
on  tlicse  matrices.  Undorstandinn  this  approach  v.iien  exposed 
to  it  for  thc'  first  tiv.ic  is  difficult.  For  this  reason  the 
mil.  rix  dt'velopm.en. '  i;;  piTsented  ’lonc  and  tlicn  a  specific 

ox:r,p]('  of  M-30  is  treated  to  increase  understanding  of  the 
I  I  clin  ique  . 

Tin.'  eo;  .  lex  l-'tMiri,-  'rannrorm  is  def.in'_d  as: 

N-1 

>:(t)  ;  x(n)eve{-- tfMhmd;)  (3.102) 

n-  0 

Eoi'  I;  0,  1,  ...,  N-1  vdierc  X(k)  and  x(n)  arc  both  complex 

V. lined,.  Faj  (3.102)  ca.  be  expressed  as  a  matrix 

I  1  I  i  T'l  i  e.i  I  b  iji :  X  -  Tx 

'.I'in  r.i.iLrix  'i'  ean  lie  deci  m.ited-in-time  (Cooley  and  Tukey, 
l')(d')  or  Id  1.  epu'noy  (Cent  Ionian  and  Eande ,  1906)  to  [>.i.'oducc 
eqii.illy  e  f  ‘  i.i  •  i  I'n  1.  fact  orj  jkj: 


]>  r 


’’2  ’■] 


v.’hc'ro  I',  ir 
1 


1  b.c  dix:.!)  ;i  ion  oorrenpond  i  nq  tlic  l  actor 


n  . 


1 


of: 


N  -  n 


--1 


n 


1 


and  P  in  Lliv  t -i  on  (digi.i  rovLM'n.ni  )  riatri;-;  ( S  i  ng] Liin  , 

196P).  Tiic  i.iatrl::  P.  fnis  orid\-  n.  non^oro  demon  fn  on  each 

a  a. 

rov,’  and  ooln’rai  and  aleta  l;o  ’;'a.rtit ioriCd  into  N/tj-  m]naro 


submatricen  of  diraansion  n^;  .it  is  this  pai'tition  that  is 
the  basis  fo^-  those,'  (i..i;-:c;d  aaidix)  ai  ao'-i  thms "  (Sinqlcton, 
19G9)  .  The  raatriccs  P^  can  be  furDicr  factored  into: 


F.  =  R.  T,  (3.103) 

111 

whore  is  the  diagonal  matrix  of  tv/iddio  (rotation)  fac¬ 
tors.  Using  these  twiddle  facto3.-s  criable  th.e  t.rigonometric 

'1  Ti  i  /  2 

syivmietries  and  complex  multipliers  (c.g.,  e-^  ,  e-^  ,  ..... 

jjp  exploited  in  tlie  rPT  but tciaf  1  :'.es  and  reduce 
the  iiunibor  of  }:eal  opc-rations.  A  specific  docimatioii-in- 
tinic  exar.iple  ir.  nov;  considered  v.’hich  uses  the  above  ideas. 

Given  ati  N-jraint  sequence  for  wliieh  the  h’-point  DPT 
is  desired,  the  inle.,.  .r  ;;  c..n  1'.  faetered  into  a  preduet  of 
siniler  i.ntee,  re.  asr,u..in.q  is.  r.-et  erimi-.  T!ie  suece.'n' i.\’o 
f  ac  Lor  ir.at  i  oi".  of  one  iu;’,;;bi.'r  into  t.wo  can  result  in  any 
poss.iblc:  cxnr.bi  nati  on .  If  N  30,  it  can  lie  factored  a.s 
5  •  6  and  tlieii  as  5  -3  '2.  Tlia  fijast  deccrapo.s i Lion  i.s  sl.ov.’n 
in  I’iaure  .^.17  a.nd  is.  re.r:'!:!  e^l  as  s.ix  f-j'oint  I'P'l's  followt'd 
by  five  6-point  ni’i’s.  'i'lie  next,  stage  of  decojiij'crsi  Lion  is 
froiv  f  •  G  t  o  [,  •  3  •  2  and  is  sliown  in  I'igure  3.1, 8.  Start¬ 
ing  with  1  hc'  id'T  expre::;:ion  in  Pq  (3.99)  the  sot]Ui,'nco  can 
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)j('  laclt'ri'i;  inln  I\!-  i.'  •  cj  .0  •  6  (ri’j>ri.'.sc'ntiiuj  a  'j 

a:ul  t.iu:  i  du  Ik  : 

p-1.  r'-.  q-1  prk 

x(l.)  >:  ):  x(|.vn.)w„ 

m-0  r=0 

Now  I'hc'  ijiiici:  sun'a;  can  bo  rosscd  a.n  the  q-rxj.ini 

q-1  rk 

Cl  (k)  =  X  (Dr-i-in)  v; 


lay  G 


(3.104) 


(3.105) 


since 

prk  rk 

-  exp  ( -  j  2  :  prk/N)  oqa  ( -  j  2rqark/pq )  =  (3.106) 

Using  p~5  and  q-G  in  Eq  (3.104)  produces: 

4  mk  5  5rk 

X(k)  =  1  W„Q  I'  x(5r+in)  W,g  (3.107) 

r,i=  0  r=  0 


The  inner  sum  in  Eq  (3.107)  is  a  6-point  DPT  which 
can  be  decomposed  into  a  3  by  2  matrix  by  dividing  the 
sequences  x(5r+m)  into  throe  sequemccs,  each  two  points 
long.  The  inner  surrnati.oji  in  Eq  (3.107)  can  bo  represented 
using  the  notation  of  Eq  (3.104)  as: 

p-1  ;:k  q-1  ptk 

G(];)  ■=  ::  v:,,  <j  (j-i  i-o)w,',  (3.100) 

s==0  '''  t-0 

wlu'TC  N  i;-'  iK^v’  p  •  c[  “  3  •  2.  Sul).'- ti  tui  i  n  :  )'■  a.nd 

q  yields: 

2  sJ;  1  3tk 

j:  W.  u(3((s)W, 

s-0  l=--0  ■'  ^ 


G(k) 


(3.109) 


<  n--' 


( 


'i'll  i  l;  ..  .•.iiri'i'f;  i  ('ll  in  l.c]  (.i.lO*))  hx'  t.  i  (  ii  (.  C'c;  .into 


(  ^  1  (1/)  1  ni\>  : 


4  mk 


1 


k  v;,, ,  k  w,  ).  x(.i  r>i.  i  ns-iiiow,, 

n  ■'  U  n  1. 

mO  iS-O  t~0 


(3.110) 


whoi'o 


r  ---  3ti-s 


cj(3t,  +  .s)  =  X  ( '3  (  3l;l :;) -hm)  =  x  ( 1 3 1  i  3;;  t  m) 


Wj,  =  exp  (- j  2  M  •  3t.k/G)  =  exj' (- j  2  n  •  l,k/2  )  - 

111  =  0 ,  1 ,  2 ,  3 ,  4 
s  =  0,  1,  2 
t  -  0,  1 

T)to  complete  flouxjraph  .Is  shown  in  I'igure  3.  IB  and 
imp]  oiiionts  Eq  (3.110). 

3.4.2  D.i  nit  Rcx'crsal  Al  aoj-ithni  (Coneral)  .  The 

pcrpiu  1 . 1 f.  ion  rao.lT.ix  P  is  .required  because  the  transformed 

result,  is  .in  a  diuit  re\  o.rsed  oiahr.  (liven  a  facltrrina- 

tion  o.r  R  --  n  n  ,  .  .  .  n^  n,  ,  tlic'  Pourior  oc'^  ff ii'ii  iit  of 

m  111- 1  2  1 

X(k)  with: 


k  1.  '1  ,11  ,,  .  .  .  n,  -I  .  .  .  I  k n ,  i  ); , 

111  ni--l  I'l-/-  1  J.  1  1 


■  J  k  1  ■»  I  I  y-v 

111  Ill- 1  I'l-.’ 

i  S  f  (Valid  ill  1  ( 'Oa  t  i  on  : 


k  '  -  k ,  n.,  n  ,  .  .  .  n  i  );„  n  ,  n ,  .  .  .  n  +  .  .  .  -t  k 
.1.  2  3  Ill  2  ,1  4  111  m 


( 1 .  .1  .1  1  ) 


;.  112) 


In  (jene’Ml  the  i  nl  ei'chanai-  of  k  with  k'  can  bt'  done  "in  j'lace" 
it  N  is  idudoied  such  tliat.  (;’,i  nijl  (d  ('U ,  1077): 


n .  -  n 
1  m-i 


(3.113) 
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lor  1  lo::::  (  haii  n-.i  .  J-’or  factorinc;  ];  car.  Ko  coiiril  c'd 

in  nuLur,']  crdcr  and  k'  in  (.liijiL  i'  '.'i  ■rr.t.'d  ojiki'  .n;  t'ic.'-.c  I'ibcd 
for  f  i  nccf- i\iCi  j  a  I  c)(  ir  i  L ’mn  i  >  i  I -rovi'i  na  ]  . 

To  iinplonient  tlri  .s  tocfinique  for  riti:;ed  radicor  ff  in 
factored  into  its  pririe  factors  and  the  "square"  factois 
arranged  synm’otrically  around  the  " stjuarc-f rc'c "  factor;; 
of  N.  For  example,  lot  N=-270  and  be  factored  as: 


3  •  2  •  3  •  5  •  3 

Now  the  reordering,  P,  is  factored  into: 

P  =  P2  (3.114) 

The  reordering  P^  is  "associated  with  the  square  factors  of 
n  and  is  done  by  pair  interchanges  as  previously  described, 
except  that  the  digits  of  n  correspond:!  ng  to  the  square- 
free  factors  are  held  constant  and  the  digits  of  the 
square  factors  are  exchanged  symmetrically"  (Singleton,  1977). 
For  example,  if: 


N  =  n^  n2  n3  n 

with  n^  =  n^,  n2  =  n^,  and 
the  interchange  as.sociated 
n2,  and  n^  is  fjiven  by: 

k  =  k-,n,  n^  ...  n,  +k 
7  0  3  i 


^  n^  ng  n.^  (3.115) 

n-,  n^ ,  Uj-  relatively  prim.o, 
with  tlie  square  fneterrs  ,  n^, 


r  n ,  n  •  ...  n ,  +  k ,  n  .  n ..  n n , 

6  .5  4  1  5  4  o  2  1 

k^  n2  n^^  +  +  k^  (3.116) 


i  nl'.c'rcfiamii'd  witli : 


-  h 

"o 

Hr  ... 

D 

n..  +  k,,  lit  n.  ...  n-i  +  k^  n, 
i  2  5  4  1  5  4 

n3  n2  n^ 

4- 

4:^ 

”3 

"2  ■ 

1  k_,  n„  n,  +  k,-  iw  1  k, 

3  2  1  5  1  6 

(3.11 7) 

li 


Tliir  iX'f 'I'cKt  1  iKi  1*-|  ij'i  t  his  cj  ];.'Uics-s  each  c-'lr  nujnt  of 

X(J.)  i  i\  I'lf  i,\.ij'c'ct  S(.'a!;u.‘nl  of  .lt-m)Lh  N/ji^  112,  cjrouijod  in 
"su.bse'iiu'ncc'5;"  of  n,_,  consccufivc  cl  cuk-jj l;s  (Siiicjl.oton, 
1977).  The  nest  reordorinej  P2  then  finished  the  J'oordcrincj 
of  each  n^  subsequoncos  witliin  each  M/n^  n2  segment. 

The  above  factor i  .".ation  is  used  in  t  he  Sing]oton  and 
IlLSh  mixed  radix  algorithms  <'ind  generates  a  complicated 
FORTRAN  code.  A  simpler  alternative  factorisation  vas 
written  by  the  author  and  used  in  his  mixed  radix  algorithm. 
The  simpler  algorithm  requires  an  additional  two  arrays  of 
length  N  to  store  the  intermediate  results  which  detracts 
from  the  algorithms  utility  v;hcn  longer  sequence  lengths 
are  transformed.  The  details  of  this  factorization  are 
presented  in  Appendix  E  for  interested  readers. 

3.4.3  Twiddle  Factors .  In  Section  3.4.2  the  factoring 
into  F^  was  described  corresponding  to  a  factor  n^.  Fj^  can 
be  factored  to  give  a  product  T^  whore  the  matrix  T^  is 
one  of  N/n^  identical  Fourier  transforms  of  dimension  n^ 
and  im  is  a  diaconal  twiddle  factor  matrix.  The  elements 
of  }m  arc  ;',jH'ci f ic'd  by  the  docim,ition-in-frequcncy  vc;rsion 
of  tlic  FFT  (Singleton,  1977). 

The  twiddle  factor  matrix  Ib  multii^lies  each  transform 

1 

a  ( Z ) 

T^  of  dimc'nslon  n^  by  c- '  "  where  7.  is  an  angle  from  the 

sed:; 

0,  Z,  27,  ...,  (n.-l)7  (3.118) 

and  7  -  2i;/N.  No  mul  tl  jilieai  1  on  Is  needed  for  thc'  zero 
angle  wliicli  givc'S  at  most  N(nj-l)/n^  coinj^lcx  multiplications 
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lor  o,  ic!i  I  r.  i ; "  I,- 


'  ■  \ 


(::i  ,  I  ''77  )  . 


/r  ii  lit  ;  I  ilia  1  1  V  , 


t  il' 

lai.l  s,  t  ... 

S'  o' 

. ;  > .  ■  1  :  i  1  o  1  III  .  i  ■  ■  . 

is  1  li  ■  V  i  :  'T  roe  jui  L  (  ‘  s 

no 

t  Wi  ddl  L'S 

a  s  1 ! 

!:  '  ns.  b.  I-  .  '  O'  b  s 

I'  u  1  t  :  1 1  i  e.  i  t  i  or  i;;  tain 

bc' 

.rurtJicr  r< 

•dun  ■ 

bp  (S'-  ‘  )  .  I'd  x-i  '1  .. 

ta e  1  o  r  i  ;  .1  on  of 

N  ■ 

!'  P  ,  , 
m  m- 1 

I'-T  r  1  t  h'  ■  nu;  '.1 .1  1  o  ;  I 

i  1  iiM  .  !  ae  1  or::  for 

an 

N  lemjth  ; 

a  ■  pi.  ■ 

nci.'  is 

m 

V 

i=l 

(r. 

^-])//o)  -  (h’-l) 

(3.119) 

This  re; 

d.:  !  i. 

i  s  uaeii  i  n  cer.;;  'U  1  in-; 

the  numb' ■  I,'  ot  real 

multi  ;7l  ications  a 

nd  add  i  t  i  on  s  r ch]  u  i  rc'd 

by  an  N  length  FFT. 

3.4.4 

Real 

Opc'ratic'P.s  Count  for 

Coriputinfi  Sine 

and  Co.rijTc^  Pi  ^  a-ico  17;ua_'.  i_on .  Roca.!]  from  Section  3.1 
that  t  J'iqonorir:  ta- i  c  vaUior:  u:'-.'d  in  an  rr'f  can  bo  computed 
usinp  i-ho  di  f  fo jma'.co  ocpiat.ions: 

con((k-(l)a)  (C  •  co.s(l;a)  -  S  •  sin(ka))  +  cos(ka)  (3.120) 

n  j  n  (  ( 1;  tl )  a)  -  (C  •  sin(ka)  S  •  co.s(ka))  +  sin(ka)  (3.121) 

where'  a  =  27i/N  radians 
C  =  -2  r;7n^(a/2) 

.S  ---  sin  (a) 

cos ( 0 )  1 

.sin(O)  -  0 

]n  the  ca::('  of  I  li.o  aulhoj''s  mixed  radix  1’1''T  the 
di  f  f  e  I  < ’IK  dill  :  L  i  I 'iir.  are  aoiaputc'd  N  times  iind  the  sine  and 

cos  i  a  iis.ults  s.'.ota-d  in  1  v.o  looku})  tables.  The  difference 

eqii.  1 1  i '  '  1 1. 1  are  < :  i  \-e  n  bv  : 

V.MvCd)  C  *  wrc(i-i)  _  c;  *  WKS(l-l)  t  WK(’(I-1)  (3.122) 

wi;r.(i)  c  *  \.r,:;(r-i)  i  s  *  vjrc(;i-i)  +  wKS(i-i)  (3.123) 
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r,'!:-.  .I'l''  ('.!2'5)  ri-tjuiro  -1  rr.il  nul  t  i  1  i  c  V;  t  i  ori;;  and 

.10  jt'.il  u;<.ld.i  L.  i  ■  t  >'ch  l.i  vif.'  they  a. re-  coiiipuleu.  C,i  \'cn  they 
arc  ccj!:ij )utcd  d  LJuicr.,  LIil'  ojx'rat. ion.'-,  coant  i5;  given  by; 

real  mult  --  411  (3.124) 

real  adds  -  ION  (3.125) 

Tlio  IMSL  and  Singleton  Fl'Ts  do  not  use  the  .sine  and 
cosine  lookup  t.aL'.lo.s  in  ordcT  to  sav'c  memory  arrays. 

Instead  the  siiv'  and  cosine  valuc.s  arc  computed  as  needed 
in  the  Jd'T  prcigram  rosulti)ig  in  an  intricate  FORTRAN  code. 

It  was  dotormiiK.'d  from  the  FORTRAN  coded  IMSL  and  Singleton 
FFTs  that  both  utilize  tlic  scime  method  of  computing  the  sine 
and  co.'iine  difference  equations.  For  this  reason  only  the 
Singleton  FFT  algorithm  v.’as  studied. 

An  algorithm  which  computes  the  number  of  real 
operations  required  was  interpolated  from  "counters"  placed 
in  the  FFT  FOR'i’ltAN  code  in  Appendix  F.  They  provided  the 
number  of  tine.-  that  each  section  of  the  FFT  subroutine 
v.'as  used  to  cc.'':!.)ula  the  .sine  and  cosine  values  for  different 
val.uor.  of  N.  i'hc  label;;  for  tlic  cou.ntc'vs  arc  s;r.'v;;i 
along  with  th<'  1  i  ui  of  F()RT!tAN  code  wlierc-  they  were 
posit,;i  tMU’d .  The  lines  of  code  are  sliowji  in  Appendix  F. 


I.^’C: 

Coun  la 

T  for 

the 

rad i x- 2 

di.ffcrencc  equation 

in  1  i : 

u  ■ •  A 

30  - 

234  0. 

I2CT,: 

Conn  11 

■r  for 

t  h  e 

radi x-2 

;rinc'  and  cosine 

1 i brai 

y  c:allr,  in 

1  incs 

2G50  -  2600. 

l4Cj  : 

Coun  1  ( 

yr  for 

the 

radi x-4 

section  which  com- 

pu  1  e : ; 

t.lu'  s 

ini'  and  cosine  terms  of  the 

.l('g  of  the  radix-4  in  1  ino.s  3030  -  3040. 

Rcier  1(1  Figuia^  3.19  which  sliov.’s  the  radix-4 
but  I  (  1  I  '  y  ri(?wg;  eiili. 
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CVjiiiitcr  fn  r  t)ie  racli::-^  .sc'c I  i  ot',  v.-hicli  coir.putca 

}'  3  )•• 

l  an. a  ,L!;..i  c(.):;,ino  Lci  iia  oi  t  lic'  l,3,  .luci  W', 

N  N 

of  LIr'  butterfly  f l.ov.'cjraph  In 

linou  314D  -  3170, 

I4CL:  Counter  for  ractix-4  .sine  and  cosine  library 

calls  in  linos  3G90  -  3700. 

IGTF:  Counter  for  the  cfcneral  twiddle  factors  section 

in  linos  4D90  -  5000^  v.iiich  computes  t’nc  sine 
and  cosine  for  the  leq  of  tlic  general  radix-j'' 
FFT.  t  ' 

IGTFE:  Counter  for  tlio  general  tv/iddlo  factors  section 

v\’liicli  computes  the  sine;  and  cosine  for  the 
remainder  of  tlic  radix-p  butterfly  legs  in 
linos  5170  -  5190. 

IGTFL:  Counter  for  the  general  radix-p  sine  and  cosine 

library  calls  in  lines  5290  -  5300, 

Data  was  collected  for  over  70  values  of  N  using  tliese 

counters.  A  subset  of  the  vi.lues  were  the  59  permissible 

soqvxonco  loncfns  of  PF?v  and  VJFTA.  Based  on  the  results  of 

these  tests  and  study  of  the  FORTF/vN  code  FFT  in  Appendix  F 

the  general  expressions  for  those  counters  were  determined. 

Given  that: 

N  =  sequence  Icncjth 

NFAC{i)  ■-  f'ae'ors  c'f  b  (as  factort'd  ))y  the  Singleton 
s  u  brou  t  L  n  c ) 

M  =-  nui'iJuu  of  fad  or:',  of  N 

KSPANj^  -  N/(N1’AC(11  *  NFAC{2)  ...  *  (NFAC(i-l)) 

then 

I2C.  (K.npAM.  -  3)/2  for  K.nitgN .  4  and  odd 

1  1  1  - 

I2C^  (!\SP/\M^  -  2)/2  for  KGPAPj^  ''  4  and  even 

12C.  =  0  for  KGPAN .  <  4 
-I  .1 


7G 


l'’or  thc'  r.^'tor:;.  of  ?  i  m  t.lu'  ic^n  for  ]  ?C  1 '(.'comos : 


I2C 


i-1 


(JifC.)  i.or  j.  file i'CJi's  of  2.  in  N 


(3., 126) 


'i'ho  oxjjrcj'.:;. ion  for  Lhe  nvinibc'r  oi'  sine  arid  cosiiu-'  calls 
durino  coo.iiutation  of  a  factor  of  2  is  [It5;!’A!'I^/7  f  ]  v.'horc 
[•]  represents  truncation  of  flic  result,  inside  tlio  brackc'L.s. 
Using  the  "  trunca  !  :i  or. "  notation: 


I2CL  =  .1  [?bSPAN./70] 

i-1 


(3.127) 


The  radix-4  .section  user,  the  same  notational  conven¬ 
tions  for  KSP/iN  and  truncation.  The  express iens  for 
I4C1,  .T4C2,  and  I4CI.  become; 


I4C2^  =  KSP/Tbv  -  1 


I4CL.  =  [KSPAN./32) 
1 


I4CI.  =  I4C2.  -  I4CL. 

Ill 


(3.128) 

(3.129) 

(3.130) 


For  all  factors  of  4  in  N  tlie  expression  becomes 


T4C2 


1 4  CL 


k 

>:  (KSPAN.-l) 

i  =  l 


k 


(]CSPAb'./3  2] 


(3.131) 


(3.132) 


i-1 

I4C1  -  14C2  -  I4CL 
where  tlu.-rc  are  k  factors  of  4  in  N. 

'J’Ik'  <iencral  i'Xi^ress.i ons  for  IC.TF,  TG'l'FF, ,  and  IGTFL  were 
di'rlved  to  In'; 


TCTFL. 

1 


'..SPAN  .  /  3  2' 
1 


IC.TF,  .  lU'.PAN. 
1  1 


IGTFL.  -  2 
1 


(3.134) 

(3.135) 
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1)  (NI’ACd)  -  1) 


(A.]  3f>) 


I  j 

I'he  j'c  iiul  t  for  flu.'  fK-i*cral  radi:-:-j>  r-c'Cl.ion  bt-coincs: 

k 

IGTF  --  IGTF.  (3..137) 

j=l  ^ 

k 

IGTFL  ::  ICTFb.  (3.138) 

i=l  ^ 

k 

IGTFE  -  >;  IGTFF  •  (3.139) 

i-1  ^ 

Eqs  (3.124)  through  (3.139)  v.’cre  proq-vanunod  in  FORTPaV.:  and 
then  tabulated  as  a  function  of  N  in  Table  3.4.  These 
results  identicallv  match  the  tests  conducted  using  the 
counters . 

Examining  tiic  FORTRAN  code  whore  the  counters  were 
located  gives  the  number  of  operations  performed  each  time 
one  of  the  couiitcrs  was  isicrcmented .  These  results  are 
presented  .in  Tabic  3.3  for  all  the  counters.  The  nuinbor 
of  real  operations,  sine  and  cosine  library  calls,  and 
oxpononl  i  ation.s  can  be  dotor.niinod  for  all  N  length  sequences 


by  usin  ; 

Table.-,  3.4  and 

3.5.  7’he  genei'al  exj.vf'r, sior. 

r,  are 

g.i.\r'n  by 

KADF 

--  4(120  1-  J4C3  1 

i  F.TJ  - )  -1-  3  ( I  •!  C  ]. )  1  2(1  ( ;t fl:  ) 

(3.140) 

ICMFb'i' 

-  4(I2C  I-  14C2 

t  IGTF  T  IGTFI.:)  +  6(l4Cl) 

(3.141) 

KF.XF 

-  2(I4C]) 

(3.142) 

3.4 

.5  K('a  i  opi  rat 

ieiu;  Count  let  M.ix<'d  FFTs . 

The 

real  oj'i.- 

l  ation;;  eounl  i  e. 

rlerivcd  from  the  nuu.liei'  of 

comj.'lcx 

tv;i  ddlo  f>u-(.c'r:;.  Mu;  ni;i;;l  '  r  of  bul  iu'rf  lies ,  a)Kl  tile  nuaiber 
of  r.iiK-  arid  cc'nine  Lei'in:.;  eo;,;;.'Utr'rl  us.ing  dif fc.rc'ncre  equations. 

78 


1259 


TAlU.i'  3 


re 


i:ACi!  cor 


Co un  Lor 

Rea  ] 

Add 

Roai 

Mult 

llxponcn- 
ti at i on 

Si  !'c 

Cei-i  i  n 

Co':  i  IK; 

c.all  s 

I2C 

4 

4 

0 

0 

0 

I2C] 

0 

0 

0 

1 

1 

I4C1 

3 

6 

2 

C 

0 

I4C2 

4 

4 

0 

0 

0 

1 4  Cl, 

0 

0 

0 

1 

1 

IC-TP 

4 

4 

0 

0 

0 

icTn; 

2 

4 

0 

0 

0 

IGTl’L 

0 

0 

0 

1 

1 
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Given  tli£it  N  is  faclorod  as: 

N  ==  I>3  p2  •••  Pm  (3.143) 

the  number  of  twiddle  factors  has  been  shown  (Singleton, 

1969)  to  be: 

m 

I  (N(p.  -  l)/p.)  -  (N-1)  (3.144) 

i=l 

where  m  is  the  total  number  of  factors  of  N.  The  number  of 
butterflies  required  for  an  N  length  sequence  is  given  by: 
m 

Z  (N/pJ  (3.145) 

i=l  ^ 

The  total  real  operations  count  is  determined  by  adding  (a) 
the  number  of  real  multiplications  and  additions  required 
per  butterfly  times  Eq  (3.145),  plus  (b)  the  complex  twiddle 
factor  multiplications  times  Eq  (3.144),  plus  (c)  the  number 
of  additions  and  multiplications  given  by  Eq  (3.140)  and 
(3.141) . 

Assuming  a  complex  multiplication  requires  four  real 
multiplications  and  two  additions  a  general  expression  for 
the  real  operations  count  can  be  determined  for  the  mixed 
radix  FFTs , 

Singleton's  mixed  radix  algorithm  contains  special 

transform  sections  for  factors  of  2,  3,  4,  and  5  as  well  as 

a  general  section  for  other  odd  factors.  This  requires 

tliat  N  be  represented  as: 

r  s  t  u  ml  m2  mk 

N  -  2  3  4  5  pj^  P2  •  •  •  (3.146) 
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'J'ho  IMSL  mixed  radix  FFT  (FFTCC)  does  not  have*  a  special 
section  for  factors  of  5  and  uses  the  general  section  to 
transform  these  factors.  The  author’s  mixed  radix  FFT  (FFTMR) 
has  sections  for  2,  3,  4,  and  5  but  does  not  have  the  general 
transform  section.  Only  the  detailed  development  of  oper¬ 
ations  count  for  Singleton's  algorithm  is  presented  here 
because  the  ether  two  algorithms  are  subsets  thereof.  The 
general  expressions  for  real  operations  versus  N  are  given 
for  the  other  two  algorithms  in  Appendix  G  and  II. 

The  radix-2  section  of  the  FORTRAN  code  for  Singleton's 
algorithm  is  shown  in  Figure  3.20.  For  factors  of  two  the 
twiddle  (rotation)  factor  complex  multiplications  are  com¬ 
puted  in  this  section  rather  than  the  "general  rotation 
section"  to  reduce  the  array  indexing  required.  Using 
Eq  (3.144)  the  total  number  of  butterflies  is  rN/2  and  from 
Eq  (3.145)  the  total  number  of  twiddle  factors  is  rN/2 
(neglecting  the  -(N-1)  term  which  will  be  subtracted  once 
the  complete  real  operations  count  for  all  factors  has  been 
developed).  The  trc'insform  for  factor  of  2  (refer  to 
Figure  3.20)  is  computed  in  lines  2200-2230  using  4  real 
additions,  if  no  twiddles  are  required,  or  it  is  computed 
in  lines  2450-2500  if  twiddles  are  necessary.  The  general 
expression  for  factors  of  two  becomes: 

real  mult  =  4 (rN/2)  -  2rN  (3.147) 

real  adds  =  4 (rN/2)  +  2 (rN/2)  =  3rN  (3.148) 

The  factors  of  3  section  shown  in  Figure  3.21  performs 
only  the  butterfly  in  this  section  and  uses  the  general 


rotation  (twiddle)  section  to  twiddle  the  data  (tJie  general 


tv.iiMlo  factor  :u,'ction  is  sliov/n  in  Figure  3.24).  Using 
LVis  (3.144)  eind  (3.145)  the  number  of  butterflies  for 
factors  of  3  is  sN/3  and  the  number  of  complex  twiddles  is 
s(2M/3)  .  Examining  lines  2760'-2870  in  Figure  3.21  shows 
4  real  multiplications  and  12  real  additions.  Each  complex 
twiddle  requires  4  real  multiplications  and  2  real  additions. 
The  expression  for  the  factors  of  3  section  becomes: 
real  mult  =  4(N/3)s  +  4(2/3)Ns 

=  4sN  (3.149) 

real  adds  =  12(N/3)s  +  2(2/3)Ns 

=  16sN/3  (3.150) 

The  factors  of  4  section  in  Figures  3.22a  and  b  include 

the  twiddles  in  the  butterfly  section  to  minimize  array 
indexing.  The  number  of  butterflies  computed  for  t  factors 
of  4  is  tN/4  and  the  number  cf  complex  twiddles  is  t(3N/4) 
from  Eqs  (3.144)  and  (3.145).  From  lines  3210-3320  and 
3540-3570  the  number  of  real  additions  per  butterfly  is  16. 
Every  complex  tv7iddle  requires  4  real  multiplications  and 
2  additions.  Combining  the  butterfly  and  twiddle  operations 
results  in  the  general  expression  for  factors  of  4: 

real  mult  =  4(3N/4)t  =  3tN  (3.151) 

reiil  adds  =  2(3N/4)t  +  16(N/4)t 

=  3tN/2  +  8tN/2  =  lltN/2  (3.152) 

The  transform  section  for  factors  of  5  shown  in  Figure 
3.23  computes  the  butterflies  for  the  u  factors  of  5.  There 
arc  uN/5  butterflies  and  u(4N/5)  complex  twiddles  based  on 
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]:;<js  (3.144)  and  (3.145).  Examination  of  lines  3t20-4090 
in  ricjuro  3.23  sliows  ]G  real  mul  fijjl.i  cm  Lions  ..inci  32  real 
additions  are  rc’cjuired  per  butterfly.  Combining  the 
butterfly  and  complex  twiddle  operations  provides  the 
general  expression  for  real  oiseration  for  factors  of  5: 
real  mult  =  lG(N/5)u  +  4(4N/5)u 

-  32UN/5  (3.153) 

real  adds  =  32(N/5)u  +  2(4K/5)u 

=  8uN  (3.154) 

where  u  is  the  number  of  factors  of  5  in  N. 

The  general  transform  section  for  odd  prime  factors 
is  more  complex  than  the  special  factors  sections.  To 
aid  in  describing  the  number  of  real  ox^erations  a  p-radix 
is  defined  such  that  p  is  an  odd  prime  greater  than  5  with 
an  associated  "mi"  integer  power.  The  real  operations 
count  for  the  general  section  does  not  include  additions 
associated  with  array  indexing  nor  does  it  count  multi¬ 
plications  and  additions  needed  to  recursively  compute  the 
sine  and  cosine  terms. 

Based  on  the  FORTihAN  program  for  the  odd  factors  shov;n 
in  Figure  3.24a  and  b  thoi-c  arc  five  sources  of  real 
operations  for  each  p^  factor.  The  first  source  shown  in 
lines  4310-4360  is  computing  the  (p^-l)/2  complex  multi¬ 
pliers  for  tlic  butterfly  legs  which  require: 

real  mult  -  4(p^-l)/2  =  2(p^-l)  (3,155) 

real  adds  -  2(p^-l)/2  =  (p^-1)  (3.156) 
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. ...ii.LiiLx.-i.;  .;  _■  i-.-d  oiiiy  oiico  for  each 

;  1 z  .1 L. ',M  '  ■  t  Lz .  ■  ( .  ,  I 1'  -  2  o  ■  f  *  ^  f  tzii C--  1  a c  L(j i'  7  i' .l*.  ;  a  i  l‘ o s 
(7-i)/2  corny' lu:-;  r..:.  LiyJ  ioro .  If  N-l‘jG-7‘7-I  there  ...rc 
still  only  (7-1) /2  complex  multipliers  needed. 

The  second  source  of  real  operations  is  produced  by 
computing  the  butterfly  transmittances  which  require  only 
real  additions.  From  Eq  (3.145)  there  are  (mi)N/p^ 
butterflies  required  for  the  (mi)  factors  of  p^.  For 
each  butterfly  there  are  (p^-l)/2  transmittances  which 
require  only  real  additions.  Examining  lines  4470-4540 
in  Figure  3.24a  show  that  the  (p^-l)/2  transmittances 
require  6  additions.  Combining  these  results  produces 
the  general  expression  for  the  real  additions: 
real  adds  =  (6  (Pj^-l)/2)  (mi)N/p^ 

=  3N(mi) (p.-l)/p.  (3.157) 

The  third  source  of  operations  is  produced  by  the 
2 

(pj^-1)  /4  butterfly  transmittances  which  require  real 
multiplications  and  additions.  Lines  4510-4750  in 
Figure  3.24b  show  there  arc  4  real  multiplications  and 
4  real  additions  needed.  Combining  this  with  the  number 
of  transmittances  and  butterflies  gives; 

real  mult  =  4  (  (mi)N/pj^)  (  (p^-1)  ^/4) 

=  (mi)N(p^-l)^/p^  (3.158) 

real  adds  =  (mi)  N  (p^-1)  ^/p^^  (3.159) 
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lines  ISOO-IDIO  p:'iow  that  this  function  requires  4  real 
additions.  Con±)ining  these  results  give  the  total  as: 
real  adds  =  (  (mi )  h'/p^)  4  (p^-1 ) /2 

=  2(mi)N(p^-l)/p^  (3.160) 

The  final  source  of  real  operations  is  shown  in 
Figure  3.24b  lines  5120-5140  which  performs  the  complex 
twiddle  multiplications.  From  Eq  (3.144)  there  are 
(mi)N (p^-1) /p^  complex  twiddles  which  provide  the  general 
expression; 

real  mult  =  4 (mi)N (p^-l)/p^  (3.161) 

real  adds  =  2  (mi)N  (p^^-l) /p^^  (3.162) 

Combining  Eqs  (3.145)  through  (3.162)  give  the  expression 
for  the  real  operations  in  the  general  odd  factors  section; 

^  2 
real  mult  =  I  2(p.-l)  +  (mi)N(p.-l)  p. 

i=l  ^  ^  ^ 

+  4  (mi)N(p^-l)/Pj^  (3.163) 

k 

real  adds  =  £  ((p.-l)  +  3N (mi) (p . -1) /p . 

i=l  ^ 

+  (mi)N(p^-l)  ^/p^  +  2  (mi)N  (p^-D/p^ 

+  2 (mi)N (p^-l)/p^) 
k 

=  Z  (p^-l)  +  7N (mi) (p. -l)/p. 
i=l  ^ 

+  (mi)N{p^-l)  Vpi  (3.164) 
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Assuniinq  that  the  sequence  can  be  factored  into 

„  -s  .t  r-u  ml  m2  mk  ^ 

N  =  2  3  4  5  .  .  .  Pj,  the  expressions  for  the 

total  number  of  real  operations  can  be  written  using 

Eqs  (3.140)  through  (3.164)  as: 

real  mult  =  2rN  +  4sN  +  3tN  +  32uN/5 

+  E  (2(p.-l)  +  (mi)N (p. -1) ^/p . 
i=l  ^ 

+  4 (mi)N(p^-l)/p^)  -  4(N-1)  +  KMULT  (3.165) 


real  adds  =  3rN  +  16sN/3  +  lltN/2  +  8uN 
k 

+  E  ((p.-l)  +  7N(mi) (p. -l)/p. 
i=l  ^  1  ^ 

+  (ini)N(p^-l)  -  2(N-1)  +  KADD  (3.166) 

Notice  that  Eqs  (3.165)  and  (3.166)  have  the  corresponding 
4(N-1)  and  2(N-1)  real  operations  subtracted  from  the  total 
multiplications  and  additions  because  the  first  stage  of  any 
FFT  decimation-in-time  does  not  require  the  "twiddle  factors" 
(likewise  with  the  last  stage  of  an  FFT  decimation-in¬ 
frequency)  .  These  equations  also  include  KADD  and  KMULT 
which  are  the  real  operations  required  to  compute  the 
recursive  sine  and  cosine  difference  equation. 

Similar  expressions  and  derivations  were  performed 
for  the  IMSL  FFT  and  the  author's  FFT  but  due  to  the 


redundancy  they  were  derived  in  Appendices  G  and  E 
respectively.  The  general  expression  for  real  operations 

required  by  the  IMSL  mixed  radix  FFT  (where  N  =  2^  3®  4^ 

_ml  m2  mk ,  .  , 

p^  P2  . . .  pj^  )  IS  given  by: 
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ri.!cii  iuuIl  -  ^r.'j  +  ^  JtN 

k 

+  7.  (2(p.-l)  +  4  (mi)N(p. -l)/p. 

i=l  J-  1 

+  (nii)N{p^-l)^/p^)  -  4(N-1)  +  KMULT  (3.167) 

real  adds  =  3rM  +  6sM  +  ltM/2 

k 

+  7  ((p.-l)  +  8  (ini)N  (p. -1)  p. 

i=l  ^ 

+  N{ini)  (p^-l)  Vpj^)  -  2(N-1)  +  KADD  (3.168) 

where  KMULT  and  KADD  are  the  multiplies  and  adds  needed 
to  compute  the  sine  and  cosine  terms.  The  general  expression 
for  real  operations  required  by  the  author's  mixed  radix 
FFT  (where  N  =  2^  3®  4^  s'^)  is  given  by: 

real  mult  =  2rN  +  4sN  +  3tN 

+  32uN/5  -  4(N-1)  +  4N  (3.169) 

real  adds  =  3rN  +  16sN/3 

+  lltN/2  +  8uN  -  2(N-1)  +  ION  (3.170) 

The  real  operations  count  for  Singleton’s  mixed  radix 

FFT  is  shown  for  N  _  200  in  Figures  3.26  and  3.27.  The 

operations  count  plotted  includes  only  the  additions  and 

multiplications  for  the  butterfly  and  twiddle  factors  in 

2 

order  to  demonstrate  the  N  "upper  bound"  and  the  N  log2  N 

2 

"lower  bound" .  The  N  upper  bound  occurs  in  the  mixed 

radix  FFTs  when  a  prime  number  must  be  transformed.  The 

N  log2  N  lower  bound  is  reached  when  N=2^.  In  between  the 
2 

N  and  N  log2  N  bounds  there  are  other  "bounds"  which  are 
observed  in  Figure  3.25.  The  dashed  lines  represent  numbers 
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Figure  3.25.  Multiplications  vs  N  for  Singleton' 


which  arc  not  primes,  but  are  not  highly  factorable  either. 


The  dashed  line.'  approaches  li  log2  N  as  !!  becomes  more 

factorable . 

The  relative  efficiency  of  radix  2,  3,  4  and  5  FFTs 
is  observed  in  Figures  3.27  and  3.28.  These  figures  plot 
real  operations  counts  for  the  mixed  radix  FFT  for  N  less 
than  250  (where  N  is  divisible  by  2,  3,  4  and  5  only)  and 
annotate  the  integer  powers  of  2,  3,  4  and  5.  Notice  that 
the  fixed  radix- 2  and  4  provide  the  "lower  bound"  and  the 
radix-3  and  5  provide  the  "upper  bound"  on  the  number  of 
real  operations  which  shows  that  integer  powers  of  2  and  4 
require  the  least  number  of  real  operations  and  radix-3 
and  5  the  most.  Other  combinations  of  factors,  i.e., 
N=120=S*4*3*2,  have  real  operations  counts  which  fall 
between  the  "bounds". 

3.4.6  Memory  Requirements  for  Mixed  Radix  FFTs . 

As  in  the  case  of  fixed  radix  algorithms,  a  major  consider¬ 
ation  in  selecting  a  particular  mixed  radix  algorithm  is 
the  memory  required  to  execute  the  FFT  subroutine  given  the 
memory  storage  limitations  of  the  computer  to  be  used.  The 
memory  requirements  for  the  three  mixed  radix  FFTs  is  given 
here  as  a  function  of  the  sequence  length  N.  Each 
algorithm  has  program  and  memory  array  requirements  which 
are  listed  below. 

All  the  algorithms  were  compiled  on  the  CDC  Cyber 
system  at  AFIT  and  the  program  memory  required  by  each  sub¬ 
routine  was  determined  from  a  "load  map"  generated  by  the 
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Figure  3.27.  Multiplications  vs  N  for  Multiples  of  2,3,4  and 


Additions  vs  N  for  Multiples  of  2,3,4,  and 


coipjriand  f'lAP,  PART.  This  load  map  gives  the  size  of  all 
programs  used  during  execution.  The  array  storage  require¬ 
ments  were  determined  from  the  FORTRAN  coded  programs  and 
reference  material  provided  with  the  IMSL  and  Singleton  FFT 
subroutines.  The  general  expression  for  memory  require¬ 
ments  for  each  FFT  subroutine  (as  a  function  of  N)  is 
given  below. 

The  subroutine  written  by  the  author  requires  899 
words  of  program  memory.  This  subroutine  (FFTMR)  also 
requires  the  "calling''  program  to  dimension  6  arrays 
(A,  B,  AT,  BT,  WKS,  and  WKC)  to  length  N.  (Use  of  these 
arrays  is  explained  in  Appendix  E) .  This  gives  the  total 
memory  array  required  as; 

FFTMR  memory  =  6N  (3.171) 

The  mixed  radix  subroutine  written  by  Singleton 
(FFTSNG)  requires  1100  words  of  program  memory.  Four  arrays 
(AT,  BT,  CK,  SK)  are  dimensioned  to  equal  the  maximum  prime 
factor  of  N.  If  there  are  no  prime  factors  greater  than  5 
these  arrays  may  be  reduced  to  1.  A  fifth  array  (NP)  is 
dimensioned  to  at  least  one  less  than  the  product  K  of  the 
square-free  factors  (see  Glossary)  of  N.  If  N  contains  at 
most  one  square-free  factor  this  array  can  be  reduced  to 
M  +  1  where  M  is  the  maximum  number  of  prime  factors  of 
N.  Two  more  arrays,  (XR,  and  XI)  are  dimensioned  to  length 
N.  The  total  memory  array  storage  becomes; 

FFTSNG  memory  =  2  •  N  +  4  •  MAXPF  +  (K-1  or  M+1)  (3.172) 
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where 


N  =  Sequence  length 

MAXPF  =  Maximum  prime  factor  of  N 

K  .  =  Product  of  square-free  factors 

M  =  Maximum  number  of  prime  factors 

NOTE:  K-1  or  M+1  is  selected  in  Eq  (3.172)  based 
on  the  number  of  square-free  factors  of 
N  as  described  in  the  preceding  paragraph. 

The  mixed  radix  subroutine  (FFTCC)  provided  as  part 

of  the  IMSL  package  on  the  CDC  Cyber  system  requires  1061 

words  of  program  memory.  A  complex  array  (A)  must  be 

dimensioned  to  length  N  and  two  other  arrays  (IWK  and  WK) 

are  dimensioned  to  length  "IWORD”,  where: 

IWORD  =  3  •  M  +  3  +  MAX  (4  •  M  +  7  +  6  •  K, 

KB  +  1  +  2  •  JK)  (3.173) 

To  define  the  quantities  M,  K,  KB  and  JK  a  prime  factor 

decomposition  of  N  is  required  such  that; 


N  =  ff  f| 


f  2  f 
KT  KT+1 


^KT+JT 


where  each  f^  is  a  prime  number  (other  than  1)  and  f^  ^  f^ 
given  that: 


i,  r  >  KT  +  1 
KT  ^  0;  JT  ^  0 

Then: 


M  *  2KT  +  JT  (3.174) 

is  the  number  of  prime  factors  in  N  and: 


K  =  max  , 

1  <  j  <  KT  +  JT 


(3.175) 


103 


is  the  largest  prime  factor  of  N.  KB  and  JK  are  defined 
as  follows: 

JK  =  1  •  fj^  •  ...  (3.176) 

where  JK  =  1  if  KT  =  0  and 

KB  =  N/(JK)^  -  2  (3.177) 

Once  M,  K,  JK,  and  KB  are  determined  they  are  substituted 

into  Eq  (3.173)  to  determine  the  value  of  IWORD,  the  actual 

work  storage  requirement.  Counting  only  the  arrays  for  the 

work  vectors  (IWK  and  WK)  and  the  data  arrays  (A  and  B) 

gives  the  total  array  memory  required  for  the  IMSL  FFT: 

Memory  =  2  *  N  +  IWORD  *  2  (3.178) 

An  example  of  N=2100  is  used  to  demonstrate  the  use 

of  Eqs  (3.172)  through  (3.178)  in  computing  the  memory  array 

required  by  the  IMSL  and  Singleton  subroutines.  For  N=2100 

2  2 

the  factors  are  2  *5  •  3  •  7  for  which  FFTSNG  memory 

becomes : 

N  =  2100  =  sequence  length 
MAXPF  =  7  =  maximum  prime  factor  in  N 
K  =  3*7  =  21  =  product  of  the  square  free  factors 
M  =  6  =  maximum  number  of  prime  factors 
Using  Eq  (3.172)  the  expression  for  FFTSNG  memory  array 
is  given  by 

2  •  2100  +  4*7  +  (20  or  7)  =  4248  (3.179) 

NOTE:  There  are  two  square-free  factors  3 

and  7,  therefore  choose  20  for  the 
last  term  of  Eq  (3.179). 

If  this  subroutine  were  used  on  the  Cyber  74  computer,  the 
program  memory  is  added  to  the  memory  array  to  give  a 
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total  memory  of: 


JK  =  1  •  •  f2  ...  =  2*5  =  10  (3.184) 

and  KB  is 

KB  =  N/(JK)^  -  2  =  2100/100  -  2  =  19  (3.185) 

The  results  of  Eq  (3.181)  through  (3.185)  provide  the 
size  of  the  work  vector  IWORD  given  by  Eq  (3.173). 

IWORD  =  3M  +  3  +  MAX  (4M  +  7  +  6K,  KB+1+2JK) 

=  18  +  3  +  MAX  (24  +  7  +  42,  19+1+20) 

=  21  +  MAX  (73,  40)  =  94 

Substituting  IWORD=72  and  N=2100  into  Eq  (3.178)  gives  the 
memory  array  for  FFTCC  as : 

2N  +  2IWORD  =  4200  +  94  =  4294  (3.186) 

Using  this  subroutine  on  the  Cyber  74  computer  requires 
1061  words  of  program  memory  which  makes  the  total  memory 
required  equal  to: 


4294  +  1061  =  5355  words  (3.187) 

For  this  length  N=2100  sequence  the  Singleton  FFTSNG  used 
less  memory  (5348)  than  the  IMSL  FFTCC  (5355)  . 

The  array  memory  requirements  given  by  Eq  (3.172)  and 
(3.178)  are  plotted  in  Figures  3.29  and  3.30  for  F  loss  than 
200.  It  is  readily  observed  that  selective  adjustment  of  N 
to  be  highly  factorable  (composite)  minimizes  the  memory 
required  by  subroutines  FFTCC  or  FFTSNG.  As  an  example  of 
how  prime  ntunbers  increase  the  memory  array  sizes,  consider 
N  =  2099  for  each  algorithm.  For  FFTSNG  the  variables  are 
MAXPF  =  2099,  K  =  2099,  and  M  =  1.  Since  N  =  2099  contains 
only  one  square-free  factor  the  array  NP  can  be  dimensioned 
to  M+l=2.  The  memory  array  for  FFTSNG  becomes; 

2N  +  4  •  MAXPF  +  2  =  12594  words  of  memory  array 
Adding  the  program  memory  of  1100  yields  the  total  memory 
requiied  to  execute  the  FFTSNG  on  the  Cyber  74: 

memory  =  12594  +  1100  =  13694  (3.188) 

For  the  IMSL  FFT  the  variables  are  K  =  2099,  JK  =  1, 

KT  =  0,  JT  =  1,  KB  =  2097,  and  M  =  1.  The  expression  for 
IWORD  becomes : 

IWORD  =  3M  +  3  +  MAX(4M+7+6K,  KB+1+2JK) 

=  3  +  3  +  MAX(12605,  2100)  =  12611 
The  total  memory  assuming  execution  on  the  Cyber  74 
system  is: 

2N  +  2*IWORD  =  2*2099  +  2*12611  =  29420  (3.189) 

which  is  5.5  times  larger  than  the  total  memory  for  N=2100. 
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.29.  Memory  Array  vs  N  (£200)  for  Singleton's  FFT. 
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3.5  Fourier  Transforms  Using  Fast  Convolution  Ale-or ithms 


The  paper  by  Cooley  and  Tukey,  1965,  had  a  major  impact 
on  digital  signal  processing  by  stimulating  the  development 
and  wide  use  of  the  FFT.  Recently  several  new  ideas  have 
been  used  to  compute  the  DPT  which  have  impacted  digital 
signal  processing.  In  1968  it  was  observed  by  Rader  that 
computation  of  the  DFT  could  be  changed  to  circular  con¬ 
volution  by  rearranging  the  data  when  N  is  prime.  Mow,  if 
given  a  fast  way  to  do  circular  convolution,  one  has  a  fast 
DFT  method.  Winograd  showed  the  minimum  number  of  multi¬ 
plications  for  circular  convolution  of  primes  and  prime 
power  length  sequences.  He  then  proposed  that  these  high 
speed  prime  power  convolutions  be  "nested"  into  long  trans¬ 
forms  to  minimize  multiplications.  The  Winograd  nested 
algorithm  has  been  studied  and  programmed  (Silverman,  1977; 
McClellan  and  Nawab,  1979;  Zohar,  1979)  for  computing  the 
DFT  of  complex  valued  sequences. 

An  alternative  to  the  Winograd  algorithm  was  proposed 
by  Kolba  and  Parks  and  combined  the  concept  of  fast  convolu¬ 
tion  with  conventional  DFT  techniques  to  aivo  another 
efficient  DFT  implementation.  Kolba  and  Parks'  prime 
factor  algorithm  (PFA)  uses  the  same  reordering  technique 
as  the  Winograd  Fourier  transform  algorithm  (WFTA) .  The 
original  PFA  (Kolba  and  Parks,  1978)  has  been  modified 
(Burrus  and  Eschenbacher,  1980)  so  it  can  transform  the  same 
sequence  lengths  as  the  WFTA. 
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This  section  presents  the  theory  of  the  WFTA  "stuall-N" 
alqorithms,  the  data  reordering  (which  is  the  same  for  PFA 
and  WFTA),  the  PFA  theory,  the  real  operations  count,  and  the 
memory  array  requirements  for  both  PFA  and  WFTA.  Since  both 
alqorithms  follow  a  similar  development  the  conversion  of  a 
DFT  to  circular  convolution  and  data  reordering  are  only 
presented  once  and  apply  to  both  algorithms. 

3.5.1  Converting  a  DFT  to  Circular  Convolution . 

To  convert  the  DFT  expression  to  a  circular  convolution  the 
DFT  matrix  [W]  must  be  "mapped"  into  the  circular  convolu¬ 
tion  matrix  [W  ] .  The  mapping  between  these  two  matrices, 
c 

and  hence  the  basis  for  the  WFTA  and  PFA  was  developed 
by  Rader  in  1968. 

Rader  showed  that  if  "N  is  prime,  there  is  some 
number  g,  not  necessary  unique,  such  that  a  one-to-one 
mapping  from  the  integers  i  =  1,2,  ...,  N-1  to  the  integers 
j=l,2,  ...,  N-1  is  given  by: 

j  =  ((g''))^  (3.190) 

where  the  notation  ((.x))^^  implies  x  modulo  I. .  "  The  example 
of  N=7  and  g=3  usino  the  mapping  of  Eq  (3. 190)  gives: 


i 

1 

2 

3 

4 

5 

6 

j 

3 

2 

6 

A 

5 

1 

1 

The  number  g  is  referred  to  as  a  "primitive  root"  in  number 
theory.  The  mapping  of  Eq  (3.190)  provides  the  convolution 
matrix  [W  ]  from  the  DFT  matrix  [W] .  Examples  of  this 
mapping  are  extensively  treated  in  the  references 
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(Silverman,  1011 Kolba  and  Parkn,  1077)  and  are  not 
repeated  in  this  paper. 

A  brief  example  of  using  the  results  of  the  convolu¬ 
tion  matrix  is  presented  to  aid  in  developing  the  small-N 
algorithm  operations  count.  Consider  the  fo-llowing  3-point 
DFT  written  in  matrix  notation  as: 


X(0)- 

O 

o 

o 

■x(0)- 

X(l) 

= 

r70r7iT,2 

W  VJ  w 

x(l) 

X(2). 

770.72, ,1 

.w  w  w 

.x(2). 

(3.191) 


4  1 

where  is  assumed  and  =  W^.  The  circular  convolution 
is  given  by: 


Xd)' 

■W^  W2- 

xd)' 

_X(2)_ 

x(2) 

which  provides  X(l)  and  X(2).  Then  the  DFT  in  Eq 
can  be  rewritten  using  Eq  (3.192)  to  give: 


(3.192) 

(3.191) 


X(0)  =  W°(x(0)  +  x(l)  +  x(2)) 

X(l)  =  W°x(0)  +  X(l) 

X(2)  =  W^x(O)  +  X(2)  (3.193) 

Using  similar  techniques  to  the  one  presented  here,  convolu¬ 
tion  expressions  to  perform  DFTs  have  been  developed  for 
N=2,  4,  5,  7,  8,  9  and  16. 
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3.5.2  !'’.-ordc'r  inn  t.hc  Data  Arrays .  Iiuplcmc-'ntinq  the 

WFTA  or  the  PFA  into  a  useful  form  involves  making  long 
transforms  from  the  short,  fast-convolution  transforms  for 
2,  3,  4,  5,  7,  8,  9,  and  16.  The  general  idea  is  "to  con¬ 
vert  a  one-dimensional  loncth  M  -  M.  M_  ...  M.  transform 

12  1 

into  a  i-dimensional  transform  requiring  computation  of 
i  shorter  length  transforms  for  k  =  l,  2,  ...,  i." 

(Kolba  and  Parks,  1977).  The  mapping  from  one-dimension 
to  i-dimensions  is  based  on  the  Chinese  Remainder  Theorem 
which  requires  relatively  prime  factors  M2  ...  M^. 

The  example  for  two  mutually  prime  factors  given  by  Kolba 
and  Parks,  1977,  is  presented  here  because  the  mapping  is 
common  to  both  WFTA  and  PFA. 

In  the  DFT: 

X(k)  =  f  x(n)  w"*^  (3.194) 

N=0 

the  index  n  of  the  input  sequence  is  referred  to  as  the 
input  index,  and  the  index  k  of  the  output  sequence  X(k) 
is  called  the  output  index.  Mappina  from  one-to-two 
dimensions  maps  the  input  index  n  into  a  pair  of  indices 


^2^  . 

n^^  =  rj^n  mod 

^1 

0 

II 

c 

...,  Mj^-1 

ri  =  M2 

mod 

^1 

"^2  ~  ^2^ 

^2 

n2  =  0> 

. . . ,  M2-I 

^2  ~  *^^1 

mod 

”2 

The  output  index  is 

kj^  =  k  mod  kj^  =»  0,  Mj^-1 

k2  =  Jc  mod  M2  k2  =  Oj^  . . . ,  M2-I 
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'I'i.'-  nt.  i :  ■[)  i  ii' ,  :  i  ( ■  :;  !_wu-to-otiij  diniiTision  for  t.lie  out¬ 


put  index  is; 


where 


(3.195) 


^1 

1 

mod 

and 

^2 

0 

mod 

M, 

X 

^1 

7  0 

mod  M2 

and 

Si  7 

1 

mod 

M2 

While  the  same  inverse  mapping  in  Eq  (3.195)  could  be  used 
for  the  input  index  n,  it  is  more  convenient  (Kolba  and 
Parks,  1977)  to  use: 


n  =  (M2nj^  +  Mj^n2)  mod  N  (3.196) 

When  the  mappings  in  Eqs  (3.195)  and  (3.196)  are  used  the 
DFT  becomes : 

M,-l  M--1  n-k-  n.k, 

X(k,  ,k,)  =  x(n.  ,n-)  ^  (3.197) 

^  2  n^=0  n2=0  1  2  M2 

At  this  point  the  WFTA  and  PFA  approach  the  implementation 

of  Eq  (3,197)  differently  as  seen  below. 

3.5.3  The  Winograd  Fourier  Transform.  A  new 

algorithm  for  comput  in<!  tiie  DFT  was  proposed  by  Winograd 

in  July  197-3.  The  WFTA  has  properties  such  that  tdio  number 

of  real  additions  remained  at  the  FFT  level  while  the 

number  of  real  multiplications  necessary  to  evaluate  the 

DFT  was  reduced  (Silverman,  1977).  This  paper  will  not 

derive  the  "small-N"  algorithms.  Readers  interested  in 

derivation  of  the  WFTA  are  referred  to  the  articles  which 

extensively  treat  the  topic  (Winograd,  1976;  Silverman,  1977; 

Kolba  and  Parks,  1977;  Zohar,  1979). 
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VJinocjrod’s  proof  started  v/ith  the  N  by  N  matri.>:  with 
elements : 

ir  ir  mod  N 

which  can  be  decomposed  to: 

«N  -  °H  “n  'n 

where  if  a  u  by  N  incidence  matrix  with  values  of  0,  1, 
and  -1  only,  is  a  u  by  u  diagonal  matrix,  and  is  an 
N  by  u  incidence  matrix  (Silverman,  1977) .  The  decomposi¬ 
tion  of  is  possible  with  large  values  of  u  relative  to 
2 

N  (i.e.,  u=N  ).  Winograd  solved  the  more  difficult  problem 

of  decomposing  given  an  incidence  matrix  which 

2 

has  dimension  u  smaller  than  N  .  V7inograd  applied  field 
theory  to  give  solutions  where  u  approximately  equals  N  for 
small  values  of  N,  where  N*  2,  3,  4,  5,  7,  8,  9,  and  16 
(Silverman,  1977) . 

Not  only  did  Winograd  prove  the  minimum  multiplication 
count  for  the  above  small-N  DFTs  but  he  also  proposed  a 
special  structure  of  Eg  (3.197)  using  Eq  (3.199).  The  two 
dimensional  transform  in  Eq  (3.197)  may  be  implemented  b\' 
first  calculating  length  DFTs: 

M_-l  n_k» 

y(nj^,k2)  =  I  x(n,,n2)W  (3.200) 

n2=0 

and  then  calculating  length  DFTs: 

M. -1 

X(kj^,k2)  =  Z  y(nj^,k2)W  ^  (3.201) 
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Usinq  the  not.iti'in  oi’  Eq  (3.199)  tlic  short  trans¬ 


form  can  be  written  in  terms  of  the  input  additions  i^^^. 

The  length 


output  additions  and  multiplications  d^^^ 

M2  transform  uses  i^^\  0^^^,  and  d^^^  (Kolba  and  Parks, 
1977),  The  Eq  (3.200)  becomes: 


y  (n^,k2l 


“2-1  q2)  ^(2) 

n  k-r  r 
r=0  2 


M„-l 

E  i  X  (n.  ,n-) 
.  rn-  1'  2 
n2=0  2 


(3.202) 


X(kj^,k2)  in  Eq  (3.201)  is  a  length  transform  of  y(nj^,kj^) 
which  can  also  be  written: 


X(k  ,k  )  =  0^^^  iii;^  y(n,  ,k2) 

Substituting  Eq  (3.202)  into  Eq  (3.203)  gives: 

X(k,,k,)  .  o'l'  d'^' 

^  "2:^  .(2)  ,(2)  ^2:^  . (2)  . 

X  1  0.  d  1  i-.-r^  x(n.  ,n^) 

r=0  ^2^  n2=0  ^^^2  ^  ^ 


(3.203) 


(3.204) 


The  order  of  summation  may  be  interchanged  to  "nest"  the 
multiplications  in  the  center  which  gives  Eq  (3.204) 
rewritten  as: 


X(k^,k2)  = 


^  r 

r=0 

k2r 

k.  m 

m=0  1 

M^-1 

I 

id) 

M--1 

i(2) 

nj^=0 

nuij^ 

n2=0 

rn2 

d^^)  d(2) 
m  r 


x(nj^,n2) 


(3.205) 
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( 


i 


I 


Eq  (3.205)  irs  th(  ror  ni  that  v.'ar.  implcmc'iil  <-'<1  into  FORTR/.!: 
codo  (McCloilan  and  Ilawab,  l979)  and  listod  in  Appendix  II. 

As  an  example  of  the  "nesting"  structure  for  tlie  WFTA 
consider  the  case  of  N=3  given  in  Eqs  (3.190)  through  (3.192) 
First,  let 


’x(l)' 

■m^/2  +  M2/2' 

.X(2). 

_M^/2  +  M2/2_ 

then  equating  Eqs  (3.206)  and  (3.191)  gives: 


'x(i)' 

'h^/2  +  M2/2 

’x(l)W^ 

+  x(2)W^' 

.X(2). 

M^/2  -  M2/2. 

x(l)W^ 

+  x(2)W^ 

Substituting, 

*  exp(-j2rr/3)  =  -l/2-j(/3/2) 

=  exp(-j4Tr/3)  =  -l/2+j{/3/2) 

into  Eq  (3.207)  provides: 

Mj^/2  +  M2/2  =  -x(l)/2  -  j(x(l)/3/2) 
-x(2)/2  +  j(x(2)/3/2) 


M^/2  -  M2/2  =  -x{l)/2  +  j(.x{l).^/2) 


-x(2)/2  -  -j  (x(2)  .  3/2; 


Solving  for  and  M2  gives: 

=  -(1/2)  (x(l)  +  x(2)) 
M2  =  -j  {.^/2)  (x(l)-x(2)  ) 
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(3.206) 


(3.207) 


(3.208) 


(3.209) 


(3.210) 


•  -  •jir*' 


For  the  alqorithm  to  be  used  in  Winograd's  algorithm  the 
nul t  iol ications  by  W^  =  l  must  bo  accounted  for  and  minimized. 
This  is  accomplished  by  modifying  the  length  3  DFT  to: 


=  x(l)  +  x(2) 
32  =  x(l)  -  x(2) 


^3  = 

x(0) 

+ 

(3. 

,211) 

Ml  = 

(-1/2 

-  l)a^  =  -{2/2)a^ 

M2  = 

-j (/3/2)a^ 

M3  = 

W^a^ 

=  ^3 

(3. 

,212) 

^1  = 

M3  t 

^1 

X(0) 

=  M3 

X(l) 

= 

+  M2 

X(2) 

=  ^1 

-  ^2 

(3. 

,213) 

Eqs  (3.211)  through  (3.213)  result  in  2  multiplications, 
1  multiplication  by  W^,  and  6  additions  which  can  now  be 
expressed  in  the  X  =  0*D*I*x  notation  as: 


■X(0)' 

1 - 

0 

0 

1 _ 

- 1 

0 

0 

_ 1 

■1  1  1" 

■x(0)’ 

X(l) 

= 

111 

« 

1  -3/2  0 

• 

Oil 

• 

x(l) 

-X  ( 2 ). 

.1  1  -i. 

.1  0  -j  /3/2_ 

1 

0 

0 

! 

1—* 

1 _ 

.x{2). 

and  then  rewritten  into  sunmiations  as: 


(3.214) 


X(k) 


u-1 


N-1 


0,  d  1  x(n) 

n  kr  r  -  rn  '  ' 
r=0  n=0 


(3.215) 


The  fast  convolution  cases  for  N=2 , 4 , 5 , 7 , 8 , 9 ,  and  16 
were  developed  similar  to  the  method  used  for  N=3  above. 

The  explicit  equations  for  these  cases  provided  the  small-N 
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operations  count  shov/n  in  Tabl(.‘  3.6  v.’hicfi  i  r;  u.s'.d  in  '■:om- 
puting  the  real  operations  count  :ir.  .1  function  'O'  ''cr 
the  V.TTA. 

3.5.4  The  Prime  Factor  Algorithm  Theory .  An  Alter¬ 
native  to  the  nested  algorithm  p.ror.cAscd  i'y  V.'inoni-ad  was 
developed  by  Kolba  and  Parks.  Because  of  the  algorithms 
structure  it  is  called  the  prime  factor  algorithm  (PFA.) 
and  uses  a  modified  version  of  VJinograd's  high-speed  con¬ 
volution  technique. 

Converting  the  DFT  to  circular  convolution  and 
reordering  the  data  arrays  for  the  PFA  is  identical  up 
through  Eq  (3.197) 
where  W  =  exp (-j 2m/M, ) , 

=  exp (-j 2tt/M2)  ,  with  and  M2  relatively 
prime . 

The  transform  in  Eq  (3.197)  may  be  performed  by  calculating 
length  M2  DFTs: 


M_-l  n_k~ 

y(nj^,k2)  =  I  x(nj^,n2)W 

n2  =  0 


(3.216) 


then  calculatina  lenoth  M,  DFTs: 

M,-l 

X(kj^,k2)  =  y(nj^,k2)W  ^ 

n^=0 


(3.217) 


The  expressions  in  Eqs  (3.216)  and  (3.217)  are  implemented 
as  short  DFTs  instead  of  "nested"  operations  as  shown  in 
Eq  (3.205)  . 
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TABLE  3.6 

SMALL-N  OPERATIONS  COUNT  FOR  VJFTA 


N 


Mult 

Mult  by  W°  Adds 


2 

3 

4 

5 

7 

8 
9 

16 


0 

2 

0 

5 

8 

2 

12 

10 


2 

1 

4 

1 

1 

6 

1 

8 


2 

6 

8 

17 

36 

26 

44 

74 


Tor  hdlh  al  norithr:;'.  I'firucturc  the  5:mall-rj  nnu^'isiions 
the  only  tin  -  ;  to  t:  i  f'n  in  iliffor.nit.  In 

the  case  of  the  PFA  structure  the'  rni.all'h  ^IquritiuT'.s  are 
modified  to  permit  a  "shift  operation"  instead  of  a  multi¬ 
plication  by  1/2.  For  the  m=3  e.xomplc  IP]?.  ^3.211)  through 
(3.213)  are  .modified  to: 

aj^  =  x(l)+x(2) 

3-2  =  X(l)  -  x(2) 

=  x(0)  +  a^  (3.218) 

=  -(l/2)a3^ 

M2  =  -j{/3/2)a2  (3.219) 

=  x(0)  + 

X(0)  =  a3 

x(l)  =  <^i  +  ^2 

X(2)  =  -  M2  (3.220) 

Eqs  (3.218)  through  (3.220)  have  1  multiplication,  1  shift 
(multiplication  by  1/2)  and  6  additions. 

Similar  sma)l-N  DFTs  result  for  N=2 , 4 , 5 , 7 , 8 , 9  and  16 
to  produce  the  operations  count  for  PFA  snall-N  algorithms 
shov,;'.  in  i’able  3.7  (Burrus  and  Eschenbachor ,  1980). 

(Complex  valued  sequences  require  the  count  in  Table  3.7 
to  be  doubled.)  If  the  implementation  of  the  PFA  does  not 
use  "shifts"  the  multiplication  count  must  be  adjusted  to 
reflect  the  multiplications  by  1/2.  The  original  FORTRAN 
program  written  (Kolba,  1977)  did  not  include  the  factor 
of  16.  Later  modifications  (Burrus  and  Eschenbacher ,  1980) 
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PFA  SM?\LL-:;  DFT  :PEPJVTIONS  COU!':T 


N  Multiplies  Shifts  Adds 


2 

3 

4 

5 

7 

8 
9 

16 


0 

1 

0 

4 

8 

2 

8 

10 


0 

1 

0 

2 

0 

0 

2 

0 


2 

6 

B 

17 

36 

26 

49 

74 


NOTE;  For  complex  sequences  the  values  in 
the  table  must  be  doubled. 


included  th'-  factor  of  ]  f>  v.i’.ich  r’.a'i  !  hi_  !’FA  caij'.hic  of 
tranu  forni :.'j  Line  saiac-  iuoric^  !'..!i'jLh:j  tde  V.’Fi'A,  It 
should  b(j  noted  that  neither  I'oi-d'lttd  version  implemented 
the  "shifts"  which  increased  the  number  of  real 
multiplications . 

3.5.5  Real  Operat ions  for  VirTA. .  To  use  the  WFTA 
the  N  length  sequence  must  be  factorable  into  R  relatively 
prime  factors  N.  N_  ...  where  each  factor  corresponds 
to  one  of  the  Winograd  small-N  algorithms  for  2, 3, 4, 5, 7, 8, 

9  and  16.  It  has  been  shown  (Silverman,  1977)  that  the 
number  of  real  multiplications  is  a  function  of  the  factors 
of  N.  To  aid  in  the  development  of  the  number  of  real 
operations  the  following  terms  are  defined; 

=  number  of  real  multiplications  in  factor 

=  number  of  real  additions  in  factor 
r  r 

=  r^^  factor  of  N 

Winograd  proved  that  the  matrix  is  an  by  diagonal 
matrix  with  only  0,  1,  or  -1  for  diagonal  entries  and  0^^ 
and  are  N  by  and  by  N  incidence  matrices,  respec¬ 
tively.  To  evaluate  the  nested  multiplications  of  D 
(Silverman,  1977)  requires: 

NMULT  =  M2  ...  (3.221) 

which  is  the  real  multiplications  count  for  real  valued 
sequences.  For  complex  valued  transforms  Eq  (3.221)  must 
be  multiplied  by  2. 


122 


All  previou:.;  inii  1  t  j  d  1  i  c.i  t,  i  onr,  c^o'uitf.  { V/ i  no'jr.i' i ,  1976; 

Kolba  and  Parks,  1977;  Silverman,  1977)  use  only  Eq  (3.221) 
as  the  source  of  real  multiplications  for  the  WFTA.  The 
multiplications  in  Eq  (3.221)  are  all  performed  by  Ei.c  ."-'lULT 
subroutine  in  Piqure  3.31.  Other  real  mul tipi  :  ca t '1  c::s  arc- 
required  in  the  WFTA  for  computing  the  multiplier  coefficients 
and  determining  the  input  and  output  permutation  vectors 
of  the  INISHL  subroutine  in  Figure  3.31. 

The  DFT  multiplier  coefficients  are  computed  in  lines 
1450-1510  of  the  WFTA  listed  in  Appendix  H  and  require: 

real  mult  =  3  *  NMULT  (3.222) 

where  N  MULT  was  computed  in  Eq  (3.221).  Determining  the 
output  permutation  vector  in  lines  2080-2170  requires; 

real  mult  =  4  *  N  (3.223) 

where  N  is  sequence  length  to  be  transformed.  Combining 
Eqs  (3.222)  and  (3.223)  provides  the  number  of  real  oper¬ 
ations  required  for  initializing  the  WFTA.  Subsequent 
transforms  of  the  same  sequence  length  do  not  require 
initialization.  The  first  complex  transform  of  length  E 
using  the  WFTA  requires: 

real  mult  =  2  *  NMULT  +  3  *  NMULT  +  4  *  N  (3.224) 
Subsequent  complex  transforms  require: 

real  mult  =  2  *  NMULT  (3.225) 

Counting  the  number  of  real  additions  is  more  compli¬ 
cated  because  the  factorization  order  of  N  will  change  the 
real  additions  count  (Silverman,  1977) .  For  a  given  factor¬ 
ization  of  N  =  Nj^  N2  . . .  the  number  of  real  additions 
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i  ri  ■'!  r\  ct^nnlr-;.:  •/ilu--'''  r  v'f'<  cran  h' 

r!'  • ■  r;:  i  i;a  (i  1  !;o  I.  in.i  I  ;  . !  1  *  ia„  . ;  i  :ic:  .  iiul 

the  VJLAVLl  arid  V/EAVL2  auLrouLiiica  iii  id  ;ur<_-  3.31.  First 
the  real  additions  from  the  "WEAVEs"  can  be  developed  by 
considerin''}  the  snecial  case  of  ”  ~  .  i  r  fined 

i  ^  i. 

as  the  "innermost"  factor  and  is  the  "outc'rm.ost "  factor. 
For  two  factors  of  N  Silverman  has  shown  the  number  of 
real  additions  to  be: 

A(2)  =  A2  +  M2  (3.226) 

(Recall  A2  equal  real  adds  to  evaluate  factor  N2  and  M2 
equal  real  multiplies  to  evaluate  N2.)  Now  consider 
N  =  N2  N^  where  N2)  is  considered  to  be  the  "inner¬ 
most"  factor.  The  number  of  real  additions  becomes; 

A(3)  =  (Nj^  N2)A3  +  M3  A(2) 

=  N2  A3  +  M3  A2  +  M3  M2  A^  (3.227) 

By  iterative  substitution  the  number  of  additions  for 
N  =  N2  N3  becomes: 

A(4)  =  (N^  ^2  N3)A^  +  M  A (3) 

=  N I  hd,  N  A .  +  M ,  N ,  Ed  A 
1234  4123 

+  M^  M3  A2  +  M^  M3  M2  A^  (3.228) 

Eqs  (3.226)  through  (3.228)  are  used  to  write  a  compact 
expression  for  the  number  of  real  additions  needed  in  the 
WEAVE  subroutines; 
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R  R-1  R 

A(R)  =  2  (  :  (  N  .)  (  ■■■  M  )  )  (  '  .  •, 

r-1  j-1  ‘  ■  :-R-rK>  ’ 

The  expression  in  Eq  (3.22'j)  represents  only  real  auditions 
used  in  WEAVEl  and  V,’EAVE2.  Other  additions  are  required  L.' 
the  iniSHL  initialization  suLioutin-^:  to  Index  thu  El  . 
coefficient  array  and  cor.pute  the  output  irnlex  vector. 

The  DFT  coefficient  array  is  indexed  with  a  J  counter 
in  line  1500  of  the  FORTRAN  WFTA  program  in  Appendix  H. 

This  part  of  the  IMISHL  subroutine  requires  NMULT  real 
additions.  The  input  index  array  INDXl  requires  another 
J  counter  in  line  1720  which  uses  N  real  additions.  The 
output  index  array  INDX2  uses  a  J  counter  in  line  2160 
which  uses  N  real  additions.  Also  the  INDX2  computation 
requires  8N  real  additions  in  line  2120. 

Totaling  the  real  additions  in  the  initialization 
subroutine  gives: 

real  adds  =  NMULT  +  ION  (3.330) 

Adding  the  results  of  Eq  (3.330)  to  Eq  (3.229)  gives  the 
total  additions  needed  to  transform  an  N  length  sequence  :cr 
the  first  time.  Subsequent  transforms  at  the  same  E  scvMtr.cc 
length  requires  only  the  number  of  adds  in  Eq  (3.229). 

The  FORTRAN  WFTA  program  written  by  McClellan  and 
Nawab,  1979,  decreased  the  number  of  real  multiplications 
for  N=9  from  13  to  11  while  the  number  of  additions  remained 
constant  at  44.  Modifying  Table  3.6  to  reflect  the  new 
multiply  count  for  N=9  gives  the  McClellan  and  Nawab  real 
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>  j:  /  '  ■ !  1  r  1 1 .  1 1 1  ;  '  ■ ;  Li.'  .  i  .  .  t  i  v  1 1  i  *  :  ,  • : 

'ruLlc  J.8. 

Lisinq  Lq.s  '3.220)  and  (3.330)  wit:,  l-iid''  i .  .3  qivoi;  the 
nuniiL-r  nf  real  ■  q '^r  ■  *  i  t'-.r  ill  r:'.  i  m  ■  '.vr’I'-'  .;>'!uence 

I  enn  t  ii;;  si.'  wn  i  ti  i\ii. .  •  :  .  t  e;' .  ;j .  .  M-  ■  ■  .  ,  i.:  : . i  •  a .■  i  i 

"REAL  .MULTI"  and  "REAL  ALl.)!  "  rcrren<_'::L  0[  l  r  ,i  L  i  or.o  for 

the  initial  transforir.  of  icnetn  M.  The  coiuirjis  labeled 
"REAL  MULT"  and  "REAL  ADD"  aive  the  operations  count  for 
subsequent  transformations  of  the  sane  sequence  length. 

The  number  of  real  operations  are  plotted  as  a  function  of 
N  in  Figures  3.32  and  3.33.  These  graphs  demonstrate  the 
large  reduction  possible  after  the  WPTA  has  been  initialized 
for  an  N  length  sequence. 

3.5.t  Memory  Requirements  for  V/FTA .  The  FORTRAN 
subroutine  WFTA  listed  in  Appendix  H  requires  2348  words  of 
program  memory  when  compiled  for  the  CDC  Cyber  74  computer. 
The  memory  array  requirements  are  given  by: 

XR,  XI,  INDXl,  INDX2:  length  N 

COLE,  SR,  SI:  i'’"'.  it.h  .V.MUL'I  -  .'I ,  .v  .'I  wliicli  is 

i  ^  j  4 

thi'  nu'-l''-;'  •  .’A  f  i:  by 

the  factors  of  N.  NMULT  is  listed 
in  Table  3.9a  and  b. 

C03,  C04,  COS,  C03,  C016,  CDA,  CDD,  CDC, 

CDD:  Total  of  88 

The  original  version  of  WFTA  dimensioned  II.DXl  ,  1NDX2,  COEF, 
SR,  and  SI  to  their  maximum  possible  lengths  of  5040,  5040, 
10692,  10692,  and  10692  respectively.  This  made  the  memory 
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1 
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I 

;  TABLE  3 . 8 

I  McClellan  and  nawab’s  wfta 

REAL  OPERATIONS  FOR  THE  SMALL-N  ALGORITHMS 


N 

M(N) 

A(N) 

2 

2 

2 

3 

3 

6 

4 

4 

8 

5 

6 

17 

7 

9 

36 

8 

8 

26 

9 

11 

44 

16 

18 

74 
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.32.  Real  Multiplications  for  WFTA. 


.00  70.00  140.00  210.00  200.00  350.00  420.00  490.00 

**10’  N^SEQUENCE  LENGTH 


^SEQUENCE  LENGTH 


array  storage  very  large  even  for  the  shortest  sequence 
lengths ; 

memory  array  =  2N  +  2*5040  +  3*10692  +  88 

=  2N  +  42244  (3.331) 

The  memory  arrays  INDXl,  INDX2,  COEF,  SR,  and  SI  were 
variably  dimensioned  by  the  author's  version  of  WFTA  in 
Appendix  H.  This  reduced  the  memory  arrays  required  to: 

memory  array  =  4N  +  3NMULT  +88  (3.332) 

The  results  of  Eq  (3.332)  are  listed  in  Table  3.9a  and  b 
for  all  values  of  N.  A  comparison  of  the  memory  required 
by  Eqs  (3.331)  and  (3.332)  is  plotted  in  Figure  3.34  which 
shows  the  drastic  savings  in  memory  storage  by  using  the 
variable  dimensions.  The  "cost"  of  variable  dimensions  is 
more  work  for  the  user  of  WFTA  because  the  dimensions  must 
be  passed  to  the  WFTA  subroutine  using  more  arguments  in  the 
subroutine  call.  The  original  version  required: 

CALL  WFTA  (XR,  XI,  N,  INIT,  lERR) 

The  modified  WFTA  call  is: 

CALL  WFTA  (N,  XR,  XI,  INIT,  lERR,  SR,  SI,  COEF, 

M,  INDXl,  INDX2) 

where  M  =  NMULT.  The  increased  complexity  of  the  second 
call  is  worth  the  savings  of  memory  arrays. 

3.5.7  Real  Operations  for  the  PFA.  The  real  operation 
sources  for  the  PFA  are  computea  from  reordering  the  data 
and  performing  the  small-N  DFTs.  The  unscrambling  constant 
which  maps  the  PFA  result  from  arrays  X  and  Y  to  arrays 
A  and  B  requires  N  real  additions  and  no  multiplications. 
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Tho  second  source,  computing  the  small -N  DI’Ts  using  fast 
convolution,  has  been  proven  (Kolba  and  Park,  1977)  for 


two  factors  (M^  M^)  to  be: 


U2  +  M2 

“l' 

(3.333) 

^2  ^2 

(3.334) 

,)  : 

■  V3^2 

(3.335) 

(3.336) 

and  for  four  factors  : 


real  mult  =  2  (M2M2M^Uj^  +  M2M2M^U2  +  Mj^M2M^U2 


+  Mj^M2M2U^) 


(3.337) 


real  add  =  2(M2M2M^Aj_  +  M^M2M^A2  +  M^M2M^A2 


+  M^M2M3A^) 


(3.338) 


where  u^^  is  the  numljer  of  multiplications  required  for 
and  Aj^  is  the  number  of  additions  required  for  . 

Notice  that  complex  data  transforms  have  been  assumed  in 
Eqs  (3.333)  through  (3.338)  and  the  number  of  multiplication 
and  additions  were  multiplied  by  two. 

As  shown  in  the  PFA  theory  chapter  the  smail-N 
algorithms  can  be  implemented  by  using  "shifts"  instead  of 
multiplications  by  1/2.  The  FORTRAN  programs  available  do 
not  make  use  of  these  shifts.  Therefore,  tho  operations 
count  for  the  PFA  small-N  DFTs  shown  in  Table  3.7  is 
modified  to  produce  Table  3.10.  Using  the  results  of 
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Eqs  (3.333)  through  (3.338),  the  N  adds  required  for  the 
output  mapping,  and  Table  3.10  the  number  of  real  multi¬ 
plications  and  additions  are  listed  for  all  permissible  N 
values  in  Table  3.11a  and  b.  The  corresponding  graphs  in 
Figures  3.35  and  3.36  show  the  multiplications  and  additions 
as  a  function  of  N. 

Even  though  this  FORTRAN  program  did  not  use  a  shift 
to  perform  multiplication  by  1/2,  incorporating  shifts  into 
the  small-N  DFTs  represents  a  significant  savings  of  real 
multiplications.  The  major  benefit  would  be  in  small 
computers  where  software  multiplies  are  more  costly  relative 
to  additions.  The  benefit  of  performing  multiplications  by 
using  shifts  is  given  in  Table  3.1a  and  b  under  the  PCT 
(percentage)  column.  PCT  was  calculated  by: 

PCT  =  ( (M-MS) *100)/M  (3.339) 

where  M  is  the  number  of  multiplications  without  using 
shifts  and  MS  is  the  number  using  shifts.  The  percentage 
savings  as  a  function  of  N  was  plotted  in  Figure  3.37  for 
all  values  of  M. 

3.5.3  Memory  Roquiremonts  for  PFA .  The  PFA  program 
listed  in  Appendix  I  requires  770  words  of  program  memory 
when  compiled  for  the  CDC  Cyber  74  computer.  The  memory 
array  requirements  are  given  by: 

X,  Y,  A,  B:  length  N 

The  memory  array  required  by  PFA  is  given  by: 

^i.y  array  =  4n 
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TABLE  3.11a 

PFA  REAL  OPERATIONS  AND  MEMORY  COUNT  FOR  N<72 
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TABLE  3.11b 

PFA  REAL  OPERATIONS  AND  MEMORY  COUNT  FOR  N>80 
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.36.  Real  Additions  for  the  Pl’A. 


“'-'  m 


Memory  Array  Required  by  PFA. 


( 


I  and  is  listed  in  Table  3.11a  and  b  .ind  rilottodi  in 

I  -L  '  J  Ll  i.  O  3.3*-/. 

3.5.9  Summary .  Two  algorithms  which  use  hifjli- 
speed  convolution  techniques  have  been  presented.  Doth  use 
the  convolution  for  computing  small-*.'  DFTs  mid  boch  reauire 
N  to  be  factored  into  relatively  prim.e  factors.  This 
particular  factorization  used  the  Chinese  Remainder  Theorem 
and  the  "Sino  correspondence"  to  reorder  the  data  arrays. 
The  theory,  structure,  and  operations  count  was  presented 
in  this  section. 
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IV. 


C(^tiari^on  Results  of  i: f f icient 
D_ij?  c  ^  Fo u r  i  i  ■  r  '1' r.  ir! t'o  rni.s 

4 . 1  Introduction 

Several  fixed  radix  and  mixed  radix  algorithms  have 
been  studied  ^^nd  tiio  number  of  real  operations  aiio  meriory 
count  required  have  been  computed  in  the  preceding  sections 
The  results  from  these  sections  are  compared  and  presented 
here. 

Tradeoffs  and  advantages  of  fixed  radix  and  mixed  radi 
algorithms  are  discussed,  the  justification  for  selecting 
Singleton's  algorithm  over  the  IMSL  and  mixed  radix  FFT 
is  given,  tables  and  graphs  comparing  the  conventional 
Fiixtd  radix  FFT  with  the  fast  convolution  algorithms  (WFTA 
and  PFA)  are  presented  and  advantages  of  each  are  discussed 
This  chapter  concludes  with  an  algorithm  which  selects  the 
most  efficient  algorithm  based  on  memory  available,  machine 
speed,  zeropacking,  and  sequence  length.  A  flowchart  imple 
mentation  of  the  algorithm,  is  included. 

The  timing  tests  in  this  section  used  the  Cyber  74 
system  clock.  This  clock  was  accessed  using  the  FORTR/.N 
command  SLCOND(CP)  which  provides  a  timer  accurate  to  .001 
seconds.  The  transforms  were  all  performed  using  samples 
from  the  function  e  ^  cos  SOTit  which  has  the  magnitude 
transform  shown  in  Figure  4.1  for  N=625. 
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The  memory  compar  i  sons  made  in  thv'.:  ch:i;)tcr  arc  based 
or^  i'H.^r.ory  array  rtan:  i  ria; .  '''h.i  ■  pro-i'i:'.  i”,.  ;ri  •  ■  >  ia  tron. 

compilation  on  the  Cyber  74  is  not  applicaJ^lt'  to  smaller 
machines  and  would  not  permit  valid  memory  comparisons.  The 
program  m.omory  required  for  the  Cyber  74  is  .iivo:;  to  show 
the  relative  sizes  of  the  algorithms. 

4 . 2  Conventional  Radix- 3  vs  R(u)  Field  Radix-3 

In  the  previous  chapter  the  real  operations  count  for 
these  two  radix- 3  FFTs  was  given  in  Table  3.2.  From  this 
table  the  most  efficient  radix-3  algorithm  can  be  selected 
based  on  machine  speed.  Validation  of  this  table  was  per¬ 
formed  using  the  CDC  Cyber  74  computer  which  has  a  1.1 
multiply-to-add  ratio  and  test  data) . 

With  a  1.1  multiply-to-add  ratio  Table  3.2  indicates 
that  the  conventional  radix-3  algorithm  is  more  efficient 
for  all  sequence  lengths  shown.  The  timing  results  in 
Table  4.1  verify  this  conclusion. 

4 . 3  Fixed  Radix  vs  Mixed  Radix  FFTs 

In  Sections  3.3  and  3.4  the  real  operations  count  and 
memory  requirements  developed  for  che  fixed  radix  and  ni.xed 
radix  FFTs.  Using  the  results  from  these  sections  the  real 
operations  count  and  memory  requirements  are  given  in  Table 
4.2  along  with  results  from  timing  tests  conducted  on  the 
CDC  Cyber  74.  This  table  demonstrates  that  Singleton's 
mixed  radix  FFT  (MFFT)  minimizes  the  operations  count  for 
factors  of  2,  3,  and  5  to  the  level  of  the  fixed  radix 
algorithms . 


147 


TABLE  4 . 1 

RADIX- 3  TIMING  COMPARISON 


N 

Convejition.i  1 
Radix- 3  Tir.e 

Rtu)  Pic 
Radix- 3  Ti: 

27 

.002 

.003 

81 

.009 

.011 

243 

.026 

.034 

729 

.094 

.117 

2143 

.305 

.393 
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TABLE 


The  prograi”,  r'omory  reejuired  by  o.ir-!;  alyoritiip  is  given 
in  Tc'iljlf-  -1  .  1 .  '1  iio  large  r,]:-'  of  !!!<■  ir,  ,i  of  thi, 

extra  sections  needed  to  transform  any  Icn.gth  trar.sforr.i  and 
the  extra  FORTRAN  code  required  to  perform  multi-variate 
transforms.  None  of  the  other  FFTs  arc  capable  of  oerforrair. 
multi-variate  transform",  without  a  significant  amount  of 
additional  user  prograniming .  Singleton's  MFFT  can  perform 
up  to  a  tri-variate  transform,  however,  this  additional 
flexibility  is  a  disadvantage  on  memory  limited  computers 
when  performing  single-variate  FFTs. 

The  fixed  radix  and  mixed  radix  FFTs  are  roughly 
equivalent  in  efficiency.  The  fixed  radix  FFTs  offer  a 
memory  savings  over  the  MFFl  for  all  radix-2  transform 
sequence  lengths  shown  in  Table  4.2  and  some  of  the  radix-3 
and  5  transform  lengths.  The  main  advantage  the  MFFT  offers 
is  the  capability  to  transform  any  length  sequence  N  while 
the  fixed  radix  algorithms  are  limited  to  integer  powers 
of  2,  3,  and  5. 


4.4  Mixed  Radix  FPT  Com.parison :  IMSL  vs  Sinoloton 

In  Chapter  3  and  Appendix  G  the  rc.il  oocraticns  and 
memory  required  for  the  IMSL  and  Singleton's  mi.xed  radix 
FFTs  were  derived  as  a  function  of  N.  Those  two  algorithms 
are  now  compared  on  the  basis  of  real  operations  and  memory 
and  the  best  alaorithm  selected. 
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TABLE  4 . 3 

PROGRAM  MEMORY  REQUIRED  BY  FFTs 


Program  Memory 


Radix- 2 
Radix- 3 


Radix- 5 


Singleton's  Mixed 
Radix 


1100 


I 


Tht'  cxprer.r'.  ion  for  real  mult  ip1  i  c- i  (  ion:,  mui  nUlit.ions 
( !• ‘V(  ■  1  o[;(h!  *'of  S  i  lu;  1  cton  '  s  l-'I-''!'  i  ;;  ’  ;1  a  ■  t  > -c!  ■  IMi'l. 

FFT  expression  for  real  operations  to  show  the  extra  oper¬ 
ations  required  bv  IMSL.  Recall  that  both  Singleton  and 
IMSL  versions  of  the  FFT  compute  sine  and  cosine  usinu  the 
difference  equation  of  Section  3.1.  Both  implement  the 
sine  and  cosine  computation  similarly  and  retjuire  the  same 
number  of  real  operations  to  compute  them. 

Assuming  that  N  can  be  factored  as: 

N  -  2  3  4  5  ...  (4.1) 

the  difference  in  real  multiplications  between  JMSL  and 
Singleton's  becomes: 

delta  multiplies  =  [IMSL  multiplication  expression] 

-  [Singleton  multiplication  expression] 

delta 

multiplies  =  [2rN  +  4sN  +  3tN  +  8  +  32(u)N/5 
k 

+  E  (2(p  -1)  +  4 (mi)N(p.-l)/p. 

i=l  1 

+  (mi)N(p^-l)  Vp^)  -  4N-1)  +  KMULTl 

~  [2rN  +  4sN  +  3tN  +  32uN/5 

^  2 
+  E  (2(p.-l)  +  {mi)N(p.-l)  /p. 
i=l  1  11 

+  4 (mi)N (p^-l)/p^)  -  4(N-1)  +  KMULT] 

=8  (4.2) 

For  large  values  of  N  the  difference  in  multiplications  is 
negligible . 
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Tho  diffcrcncG  in  rr-al  additions  is  derived  from: 
d’.'lta 

adds  =  [IMSL  addition  expression] 

-  [Singleton  addition  expression] 

delta 

adds  =  [3rH  +  6sN  +  15tN/2  +  4  +  48(u)N/5 
k 

+  Z  ((p.-l)  +  8 (mi)N(p.-l)/p. 
i=l  ^ 

+  N(mi) (p^-1) ^/p^)  -  2(N-1)  +  KADD] 

-  [3rN  +  16SN/3  +  lltN/2  +  8ul3 

k 

+  I  ((p.-l)  +  7N(mi) (p.-l)/p. 
i=l  1  11 

+  (mi)N(p^-l) ^/p^)  -  2(N-1)  +  KADD] 

=  2sN/3  +  2tN  +  8uN/5  +  4 

+  N(p.-l)/p^  (4.3) 

The  results  from  Eqs  (4.2)  and  (4.3)  demonstrate  that 
the  IMSL  has  approximately  the  same  number  of  real  multi¬ 
plications  but  requires  significantly  more  additions  than 
Singleton's  mixed  radix  algorithm.  Based  on  these  results 
and  because  the  data  reordering  for  the  two  subroutines 
is  the  same,  the  Singleton  FFT  is  the  most  efficient  of  the 
two  subroutines.  This  conclusion  was  confirmed  by  timing 
tests  on  the  CDC  Cyber  74  computer  at  AFIT.  The  results 
are  shown  in  Table  4.4  for  selected  sequence  lengths. 

The  memory  array  required  for  each  of  the  algorithms 
was  derived  in  the  preceding  chapter.  Those  results  are 
now  compared  for  N  less  than  200  and  the  percentage  of  array 
memory  saved  by  Singleton's  FFT  over  the  IMSL  FFT  was  plotted 
in  Figure  4.2  using  the  equation; 
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TABLE  4 . 4 

TIMING  RESULTS  FOR  IMSL  AND  SINGLETON  Fi'Ts 


N 

IMSL 

Time  (sec) 

Singleton 
Time  (sec) 

60 

.010 

.008 

120 

.018 

.014 

125 

.019 

.012 

128 

.013 

.011 

210 

.039 

.036 

243 

.031 

.031 

256 

.028 

.021 

315 

.054 

.052 

420 

.081 

.072 

504 

.090 

.082 

625 

.128 

.076 

729 

.107 

.107 

840 

.163 

.150 

1008 

.151 

.157 

1024 

.126 

.092 

1250 

.  275 

.158 

1260 

.  268 

.231 

2048 

.269 

.224 

2187 

.366 

.364 

2520 

.565 

.495 
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Picture  4,2.  Memory  Array  Saved  Using  Singleton' 


savin-i;-.  -  (MKMCC  -  "'.".T'/.c.)  •  I  OO/'TMCC  (4.  A 

v.-h'  ri'  -M-yt-r  rr':'L  iy  n 

MTIHSi'.T!  =  Siiicjl'.  Lop. ' s  array  iM  iuory 
From  tho  plot  it  is  evident  tliat  Sinploton's  alooiithm  uses 
less  mcrory  than  the  IMSh  proarnr-  .  '"he  "fl  p  "  lor:  •;  of 
the  curve  approaches  57".  which  can  be  verified  bv  ena, mi  na¬ 
tion  of  Cqs  v3.172)  through  (3.173)  for  a  prime  nur.ber. 
This  number  represents  the  memory  savings  at  the  points 
where  N  is  prime. 

The  values  of  M,  K,  KB,  and  JK  used  to  compute  the 
IWORD  constant  in  Eq  (3.173)  are  M=l,  K=N,  KB=N-2  and  JK=1. 

IWORD  =  3  •  M  +  3  +  MAX  (4  •  M  +  7  +  6  •  K, 


KB  + 

1  +  2 

•  JK) 

(4.5) 

IWORD  =  3  +  3  +  MAX 

(6N  + 

11,  N 

+  1) 

(4.6) 

IWORD  =  6  •  N  +  17 

(4.7) 

Now  the  memory  for  IMSL  given  that  N  is  prime  becomes: 

MEMCC  =  2  •  N  +  2(6  •  N  +  17)  (4.8) 

MEMCC  =  14  •  N  +  34  (4.9) 

The  array  mem.ory  required  by  Singleton’s  FFT  is  based 
on  tho  va.luos  MP  and  KD.  ’.'P  is  dimons i.o  .s.'u:  ioss  tiian 
tho  product  of  the  square  free  factors  of  N  or  if  at  most  one- 
square  free  factors  is  present,  MP  can  be  dimensioned  to  M+1 
where  M  is  the  number  of  prime  factors  in  N.  KD  is  the  size 
of  arrays  AT,  BT,  CK,  and  SK  whore  KD  equals  the  largest 
prime  factor  in  N.  Using  these  results  the  expression  for 
array  memory  where  N  is  prime  becomes; 

MEMSNG  =  2  •  N  +  4  •  KD  +  NP  (4.10) 
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Substitut  in<j  lor  NI’  and  KD  this  r-tjuation  is. 

MCMbbC  =  2  •  :i  I-  4  •  Ci  4  2  (4.11) 

MEMSMG  =  6  •  N  +  2  (4,12) 

Substituting  Eqs  (4.9)  and  (4.12)  into  the  percentage 
expression  in  Eq  (4.4)  is  seen  to  .rp-pioach  approxiiaa Le ly 
57%: 

%  savings  =  ((14  •  N  +  34)  -  {6  •  N  +  2)) 

•  100/(14  •  N  +  34)  (4.13) 

%  savings  =  (8  •  N  4-  36)  •  100/(14tg  4-  34)  (4.14) 

As  N  gets  large  Eq  (4.14)  becomes: 

%  savings  =  800N/14N  =  57%  (4.15) 

which  corresponds  to  the  results  shown  by  Figure  4.1. 

The  memory  array  must  be  added  to  the  program  memory 
to  determine  the  size  of  the  program.  The  program  memory 
required  by  each  algorithm  was  determined  by  compiling  each 
algorithm  for  the  CDC  Cyber  74.  The  IMSL  FFT  used  1061 
words  and  the  Singleton  FFT  used  1100  words.  The  larger 
size  of  the  Singleton  FFT  relative  to  the  IMSL  version 
is  because  of  the  extra  FORTRAN  code  needed  to  perform 
multi-variate  FFTs .  These  program  mom.cry  figures  arc  onI\- 
applicable  for  the  FORTRAN  compiler  u.sod  here  at  Ai’IT, 
however,  they  do  provide  a  relative  measure  of  the  program 
memory  size.  Singleton's  program  requires  about  3.7%  more 
program  memory. 

The  results  for  real  operations  count  and  memory 
required  show  that  Singleton's  mixed  radix  FFT  is  superior 
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to  tho  ’  1  ■  lo  r  ;  f .  ''or  I  h  io'.  toMi'on  i  n' :  1  ■  •  1  i  in  ' 

1  '  ''-jin  I  -1 .s  ;  1  :  I  '  ' .  ,  t  i  o :  ■  :  ’  ;  ■  '  M !  i  rOt.t  t  i  1 1< 

available  for  coni.eari  non  ro  the  WPTA  and  PPA  !n  ‘^hc  fallow- 
inq  sections. 

4. a  Co!iv('nt  1  onai  vs  lOist  Coiiv(;  i  lu  ;  n  k.ni;-:  ::  is 

Singleton's  algorithm  (MFFT)  is  referred  to  as  a 
"conventional"  FFT  because  it  uses  the  Cooley-Tukey  deci¬ 
mation  and  reordering  of  the  data  array.  The  WFTA  and 
PFA  use  Winograd's  small-N  fast  convolution  algorithms 
to  perform  the  DFT.  The  operation  and  memory  array  counts 
are  presented  in  Figures  4.3  and  4,4  and  Tables  4.5a  and  b, 
as  a  function  of  N  for  comparison  of  the  three  algorithms. 
These  tables  and  plots  illustrate  the  advantages  and  dis¬ 
advantages  of  each  algorithm  and  are  used  along  with  the 
fixed  radix  results  in  Table  4.2  to  select  the  most 
efficient  algorithm  for  a  particular  sequence  length  and 
machine  capability  (size  and  speed) . 

The  tables  and  plots  refer  to  the  algorithms  as  MFFT 
(Singleton),  WFTA  d'.'i  noa  rad )  ,  md  PFA  (Kolba-Fnr  ••.a )  .  The 
PFA  used  lor  opor'a;  ton  counts  aru:  monv>ry  c'o"';.  o  isons  is 
the  one  described  by  Burrus  and  Eschenbacher  wliich  includes 
prime  power  factors  of  2, 3, 4, 5, 7, 8, 9  and  16.  The  FORTRAN 
coded  program  for  PFA  was  obtained  from  C.  S.  Burrus  of 
Rice  University  and  does  not  make  use  of  "shifts"  for 
multiplications  by  1/2.  Both  the  WFTA  and  MFFT  FORTRAN 
programs  were  obtained  from  the  IEEE  Press  "Programs  for 
Digital  Signal  Processing". 
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the  program  memury  changes  based  on  machine  wore  length. 

The  program  memory  required  for  the  Cyber  74  is  given  for 
each  .ilaorlthm  so  t}ir^  rolai-i'/e  r.ico  ''an  be  corn  -  .  ra  ,1 . 

4.5.1  Real  Operations  Count .  The  mixed  radix  MFFT 
written  by  Singleton  includes  special  sections  for  factors 
of  2,  3,  4,  and  5  as  well  as  a  general  section  for  odd 
prime  factors  which  permits  the  transformation  of  any 
positive  integer  N  length  sequence.  Because  of  the  special 
sections  the  operations  count  is  less  for  an  N  which  is 
highly  factorable  by  2,  3,  4,  or  5  instead  of  higher  prime 
powers.  Figure  4.3  and  4.4  demonstrate  the  efficiency  of 
Singleton's  MFFT  relative  to  the  radix-2  complex  transform 
multiplications  and  additions  count  of  2N  log2  N  and 
3N  log2  N  respectively  (Winograd,  1976) .  The  MFFT  oper¬ 
ations  count  shown  in  Figures  4.3a,b  and  4.4a,b  are  for  N 
factorable  by  2,  3,  4 ,  or  5  combinations  thereof.  The 
WFTA  and  PFA  counts  are  shown  for  all  59  sequence  lengths 
which  they  can  transform..  Rccnl.'  '  om  Section  3.4  and  3.5 
that  V.'FTA  and  PF.\  sequence  J  tcIis  ...e  limited  bv  the  data 
reordering  algorithm  used  by  the  WFTA  and  PFA.  These 
figures  also  reflect  the  WFTA  "post-initialization"  oper¬ 
ations  count.  As  shown  in  Section  3.5  the  post-initiali¬ 
zation  count  is  significantly  less  than  the  number  of 
operations  required  for  the  initial  transform  of  length  N. 
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Figure  4.3a.  Real.  Multiplication  Comparison  for  PFA,  WFTA,  and  MFFT 


Figure  4.3b,  Real  Multiplication  Comparison  for  PFA,  WFTA,  and  MFt'T 


Figure  4.4a.  Real  Addition  Comparison  for  PFA,  WFTA 
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data  pre.sontod  licrc  c:ollccted  Liy  timiny  L!u'  individual 

subroutines  (INISHL,  PERM  1,  WEAVE  1,  MULT,  WEAV!'  2,  PER-M  2) 
in  the  WFTA  for  different  sequence  lengths  and  then  (Ji\-ie.ine 
the  time  required  for  each  subroutine  by  the  total  time  for 
all  of  the  subroutines.  Comparing  the  MEET  and  PEA  against 
the  post-initialized  WFTA  is  assumed  to  be  valid  because 
most  applications  of  DFTs  involve  the  repeated  transform 
of  N  length  sequences. 

A  point  by  point  comparison  of  MFFT,  WFTA,  and  PFA 
real  operations  is  presented  in  Table  4.7.  The 
sequence  lengths  in  these  tables  represent  the  only  lengths 
permissible  for  both  PFA  and  WFTA,  whereas  the  mixed  radix 
MFFT  can  transform  any  sequence  length.  The  operations 
count  presented  in  Tables  4.2,  4.7  with  a  computer's 
multiply  and  add  speed  can  predict  the  most  efficient 
(fastest) DFT  technique  for  that  particular  computer. 

Using  the  multiplv  and  add  speeds  determined  for 
the  CDC  C\’bor  74  (see  Ap]^cndix  J)  as  1.9  x  10  ^  second, s 
andi  1.7  X  ].0  seconL!:,; ,  ri'spoct  i  ve  ly  ,  th.o  :i  i  co  n  1; iiP’s 
execution  speeds  were  predicted  from  the  operations  count 
in  Tables  3.9  and  4.7.  The  predicted  execution  speeds 
do  not  account  for  all  of  the  actual  execution  time 
measured  as  shown  in  Figure  4.5.  The  extra  time  which 
was  not  predicted  by  the  real  operations  count  comes 
from  array  indexing  and  data  reordering  needed  in  all  of 
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TABLE  4.G 


TIMING 

RESULTS 

FROM  THE 

WFTA  S 

UB ROUTINES 

N 

INISUL 

PERM  1 

WEAVE  1 

MULT 

WEAVE  2 

PERM  2 

315 

48.0'i 

7.5% 

16.3% 

4.5% 

16.3% 

7.4% 

360 

47.0% 

5.9% 

15.7% 

5.9% 

21.6% 

3.9% 

630 

43.9% 

5.6% 

18.7% 

5.6% 

21.5% 

4 . 7% 

720 

44.0% 

3.5% 

20.0% 

6.1% 

22.8% 

3.6% 

840 

34.5% 

5.5% 

23.6% 

6.4% 

23.6% 

6.4% 

1008 

48.0% 

1.7% 

19.2% 

6.2% 

21.5% 

3.4% 

1260 

38.2% 

5.3% 

18.1% 

6.4% 

27.7% 

4.3% 

Results  are  given  as  %  of  total  time 
to  execute  WFTA. 
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AND  PREDICTED  TIMING  RESULTS  FOR  MFFT,  WFTA,  AND 
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WFTAl  usf  :  '  i 'i  i t i al izat ion  subroutine  and  WFTA2 


,  i  !  ’.i:)  r  i  ‘  ,  !  lu'  ■  m'-  -  i !  i  f  t  •  ••  i  ■  ■  ■  i  ^  .  .  •  ,  . 

based  onlv'  on  real  operations  arc  sufficient  to  select  the 
most  efficient  alcjorithm  as  demonstrated  by  Table  4.7. 

Tlio  timinc:  results  in  Table  eorpire  oni^-to-ono  v.'ith 

the  predicted  times  (given  the  scanciard  Jeviat iur.s  sh.ov.'n  in 
parentheses)  for  all  three  algorithms.  Several  observa¬ 
tions  can  be  made  from  Table  4.7.  First,  the  WFTAl  which 
represents  the  initial  transform  made  by  VJFTA  may  be  slower 
than  MFFT  for  certain  sequence  lengths.  An  example  of  this 
is  N=315,  630,  and  720,  all  of  which  were  correctly  pre¬ 
dicted  to  be  slower  from  the  operations  counts  in  Tables 
3.9  and  4.6.  Second,  the  post-initialized  WFTA2  and  the 
PFA  were  predicted  to  be,  and  are,  faster  than  MFFT  for  all 
sequence  lengths.  Third,  the  PFA  and  WFTA2  (post-initiali¬ 
zation)  are  close  in  efficiency  for  all  sequence  lengths. 

4.5.2  Memory.  The  memory  array  for  MFFT,  WFTA,  and 
PFA  was  compiled  from  the  previous  chapter  and  presented  in 
Figure  4.G  and  Table  4.5a  and  b.  The  figure  clearly  demon¬ 
strates  how  much  less  memory  array  is  required  b;.'  MrPT. 

These  results  arc  duo  to  the  efficient  diita  reordcrincr 
technique  of  MFFT  which  can  essentially  be  done  in  place 
with  very  little  additional  memory  relative  to  the  sequence 
length.  The  WFTA  and  PFA  base  their  data  reordering  on 
the  Chinese  Remainder  Theorem  and  require  an  additional  two 
length  N  arrays  for  PFA.  The  WFTA  uses  even  more  memory 
array  because  of  the  algorithm's  structure  which  "nest" 
multiplications  inside  all  the  additions.  This  requires 
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Figure  4.6.  Memory  Arrays  Required  by  MFFT,  WFTA,  and  PFA 


thri'’  ,'icld  i  t' i  on.  1 1  arrays  of  lonqth  M  "  r.j  :ii  ,  ■■  ,  i  .  are 
til*'  '  a  1  t.  i ; '  1  i  <■  M  t;  i  r.  ;i i  i  ^  ‘ 

store  the  multiplication  coefficients  .and  ;  r.r.i].  '...arhing 
array  storage  because  the  VJFTA  is  not  coni;<utca!  ir;-:'l.:)ce . 

The  proqram  riemory  was  not  included  i  aa  t  he^  t..;:  ■■’.ations 
for  comparison  because  program  memory  required  depcp.ds  on 
the  machine  word  size.  The  [ rogram  memory  required  on 
the  Cyber  74  for  each  algorithm  is: 

PFA  program  memory  =  770  words 
WFT  program  memory  =  2348  words 
FFT  program  memory  =  1100  words 
These  results  were  achieved  from  the  standard  compiler 
command  FTN  for  the  FORTRAN  IV  language.  For  short  sequences 
these  program  memory  requirements  contribute  significantly 
to  the  choice  of  the  most  memory  efficient  algorithm. 

4.5.3  WFTA  vs  PFA  Operations  Count .  The  tradeoffs 
between  WFTA  and  PFA  for  real  multiplications  and  additions 
can  be  seen  in  Figures  4.3  and  4.4.  In  most  cases  the  WFTA 
requires  less  multiplications  but  m.ore  additions  than  PFA. 

The  selection  of  the  most  efficient  aluorithr  tbior.  bocomos 
dependent  on  machine  speed  of  I'cal  addiL.icui  cor.p'  ir'.  c.  so 
real  multiplication.  As  an  example  of  this  tradeoff  between 
additions  and  multiplications  consider  the  case  of  N=:630. 

For  this  sequence  length  the  PFA  requires  4352  multiplica¬ 
tions  and  18534  additions  while  the  WTTA  requires  2376 
multiplications  and  22072  additions.  Assuming  the  machine 
add  speed  of  1.7  x  10  ^  seconds  and  a  multiply  speed  of 
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Tor  Llic  solectud  add  ana  n’.uitiply  speed  PFA  was  faster. 
However,  consider  the  case  v.'hcro  a  multiply  requires  three 
tines  th  .  .id;:  tics  -  ;  .  's  ~.2  10  ^  seconds.  For  the 

sane  M-G30  the  PF.h  speed  is  predicted  to  be  .  054  seconds 
and  the  Wl'TA  speed  is  .050  seconds.  With  the  increase  in 
multiply  time  from  1.9  to  5.1  microseconds  the  WFTA 
became  the  more  efficient  algorithm.  This  e.xample  illus¬ 
trated  why  the  add  and  multiply  speed  must  be  known  to 
select  the  fastest  algorithm  for  a  particular  sequence 
length  N. 

The  effects  of  changing  the  multiply  to  add  ratio  from 
1  to  20  is  shown  in  Figure  4.7a,  b,  and  c  for  MFFT,  WFTA, 
and  PFA.  For  the  sequences  N=315  and  1008  the  PFA  is  most 
efficient  at  the  low  multiply  to  add  ratios  but  as  the 
multiplies  are  "more  costly"  the  WFTA  soon  becomes  the 
most  efficient.  For  N=30  the  WFTA  is  the  most  efficient 
for  all  ratios. 

4.G  n]  o::  i  h  i_l  i_f.'  '  t'hc  'd'd'  A I  oorithns 

It  is  clear  from  the  plots  in  Figures  4,3,  4.4,  and 
data  in  Table  4.2  that  the  fixed  radix  FFT,  PFA,  and  WFTA 
arc  somewhat  lim.ited  in  permissible  sequence  lengths, 
whereas  the  mixed  radix  FFT  provides  a  much  more  "dense" 
selection  even  for  sequence  lengths  factorable  by  only 

i 

2,  ,  4,  or  5.  The  restriction  in  possible  values  for  N 
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Figure  4,7c.  .Relative  Efficiencies  of  MFFT,  WFTA,  and  PFA 
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4.7  ^  Algorithm  to  Select  the  Most  i:  f  f  i  c  1  o  n  t  OFT  Technieue  . 

The  results  of  this  chapter  are  used  to  develop  a 
systematic  approach  to  selecting  the  most  efficient  DFT 
method  from  the  fixed  radix  FFTs,  mixed  radix  FFT  (MFFT) , 
WFTA,  and  PFA.  A  flowchart  is  presented  which  selects  the 
most  efficient  algorithm  based  on  real  operations,  computer 
memory,  machine  speed,  and  sequence  length.  The  algorithm 
requires  inputs  of  machine  speed  for  add  and  multiply, 
sequence  length,  zeropack  limits,  and  computer  memory.  This 
algorithm  also  assum.es  that  the  same  length  sequence  will 
be  repeatedly  transformed  such  that  the  WFTA  in  initialized 
only  once. 

4.7.1  Arguments.  gho  aldc.r  ro.uii.  res  in;^uts: 

N:  Sequence  length  to  bo  transformed 

NP :  The  upper  limit  to  which  the  sequence  length 

can  be  filled  to  reach  an  efficient  transform 
length . 

A:  Machine  addition  speed 

M:  Machine  multiplication  speed 
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4.7.2  Usage.  The  algorithm  is  presented  as  a  flow- 
^'1  irt.  Tlu'  l)aric  logic  of  the  algorithm  is: 

(1)  Zeropack  (if  permitted)  to  the  nearest  WFTA  or  PFA 
sequence  length. 

(2)  Determine  the  memory  requirements  for  the  WFTA  and  PFA. 

(3)  If  WFTA  and  PFA  both  fit  in  computer  memory  available, 
select  between  the  two  by  using  real  operations  and 
computer  speed. 

(4)  If  only  PFA  or  WFTA  fit  in  computer  memory,  select  the 
one  that  fits. 

(5)  If  neither  PFA  nor  WFTA  will  fit  in  computer  memory, 
zeropack  to  nearest  N  an  integer  power  of  2,  3,  or  5. 
Choose  the  most  efficient  algorithm  from  the  fixed  radix 
FFT  and  MFFT  based  on  real  operations  counts  and 
machine  speed. 

(6)  If  fixed  radix  FFT  cannot  be  used,  zeropack  to  nearest 
N  factorable  by  2,  3,  or  5  and  use  the  mixed  radix  FFT. 

Using  the  flow  diagram  of  Figure  4.8a,  b,  and  c  along  with 
the  specified  tables  selects  the  most  efficient  algorithm. 

An  example  for  N=410  demonstrates  the  use  of  Figure  4.8 
and  the  tables  in  this  paper  to  select  the  most  efficient 
DFT.  Given  that  A=450  nanoseconds  (ns),  M=1000  ns,  10% 
zeropacking  permitted,  and  no  memory  limitations,  the  most 
efficient  algorithm  can  be  selected. 

(1)  MEM  is  very  large  and  is  not  a  limitation 

(2)  N=410 

(3)  NP=410  +  .10(410)  =  451 
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Figure  4.8a.  Flowchart  to  Select  Most  Efficient 
Algorithm. 
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(4)  NP=N?  No,  continue 

(5)  NP£5040?  Yes,  continue 

(6)  Zeropack  to  nearest  WFTA  PFA  length  given  in  Table  4.6 
which  is  NP=420. 

(7)  PFA  fit  in  computer?  Yes,  continue 

(8)  WFTA  fit  in  computer?  Yes,  continue 

(9)  Determine  fastest  algorithm  between  WFTA  and  PFA  from 
Table  4.6.  For  N=420, 

WFTA  PFA 

Mult  1296  2528 

Add  11352  10956 

Using  A=450  ns  and  M=1000  ns  the  predicted  speeds 
are;  WFTA  =  6.4  milliseconds 
PFA  =  7.5  milliseconds 

For  this  sequence  N=420  and  for  the  add  and  multiply  speeds 
given  the  WFTA  is  the  fastest  algorithm.  However,  if  this 
sequence  were  only  being  transformed  once  for  a  particular 
utilization  and  the  WFTA  could  not  be  repeatedly  used  without 
initialization  the  WFTA  counts  must  be  taken  from  Table  3.11 
where  4920  multiplications  and  16200  additions  are  used  to 
initialize  the  WFTA  and  perform  the  transform.  Now  the 
WFTA  is  predicted  to  use  56.5  milliseconds  to  transform 
N=420.  When  selecting  between  WFTA  and  PFA  the  particular 
utilization  must  be  considered. 

It  should  also  be  noted  that  the  predicted  times  from 
Table  4.6  are  based  only  on  real  operations  which  do  not 
account  for  all  of  the  execution  time  required  as  shown  by 
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the  timing  tests.  For  the  cases  tested  in  Table  4.7  on  the 
cue  Cyber  74  the  real  oneratioas  accounted  for  average  675. 
of  the  PFA,  65%  of  the  WFTA,  and  61%  of  the  MFFT  actual 
execution  speed. 


V.  Conclusions 


This  paper,  for  the  first  time,  presented  a  capability 
to  select  the  most  efficient  DFT  based  on  real  operations. 
These  real  operations  were  tabulated  and  plotted  as  a 
function  of  N.  The  algorithms  studied  and  compared  for  real 
operations  and  memory  include: 

1.  Radix-2  FFT  from  Rabiner  and  Gold. 

2.  Radix-3  FFT  written  by  the  author. 

3.  Radix-3  FFT  in  R(u)  from  Dubois  and 
Venetsanopolous . 

4.  Radix-5  FFT  written  by  the  author. 

5.  Mixed  radix  FFT  for  factors  of  2,  3,  or  5 
written  by  the  author. 

6.  IMSL  mixed  radix  FFT  which  can  transform 
any  sequence  length  N. 

7.  Singleton's  mixed  radix  FFT  which  can 
transform  any  sequence  length  N. 

8.  Winograd  Fourier  transform  algorithm  (WFTA) 
written  by  McClellan  and  Nawab. 

9.  Prime  Factor  Algorithm  (PFA)  written  by 
Burrus  and  Eschenbacher . 

5 . 1  Results  and  Conclusions 

The  two  radix-3  FFTs  were  compared  for  real  operations 
and  memory  required  to  perform  the  DFT  of  N  length  sequences 
where  N=3™.  Selection  criteria  were  developed  and  tabulated 
based  on  machine  speed.  The  new  radix-3  FFT  in  the  R(u) 


field  uses  less  multiplications  but  more  real  additions 
than  the  conventional  Radix-3  FFT.  The  moi'e  ei'ficient  of 
the  two  algorithms  depends  on  the  relative  costs  of  multi¬ 
plications  and  additions.  The  Radix-3  in  R(u)  is  most 
efficient  v/hen  multiplications  are  costly. 

All  of  the  fixed  radix  algorithms  were  compared  to  the 
Singleton  mixed  radix  FFT  for  real  operations  and  memory. 

The  operations  counts  show  that  the  most  efficient  algorithm 
depends  on  multiplication  and  addition  speed  of  the  computer. 
Data  was  tabulated  for  selecting  the  best  algorithm  based  on 
this  criteria.  The  FFT  algorithm  using  the  least  memory  can 
also  be  selected  from  Tables  4.2  and  4.3.  The  ].imited  choice 
of  sequence  lengths  possible  with  the  fixed  radix  FFTs 
reduce  their  utility  compared  to  Singleton's  mixed  radix  FFT. 

Three  conventional  mixed  radix  FFT  algorithms  were  com¬ 
pared  for  efficiency,  memory  array,  and  flexibility.  The 
author's  mixed  radix  FFT  was  very  efficient  but  required 
more  memory  array  and  was  not  as  flexible  since  N  was  limited 
to  factors  of  2,  3,  4,  and  5.  It  was  shown  that  Singleton's 
mixed  radix  FFT  was  more  efficient,  flexible,  and  used  less 
memory  array  than  the  IMSL  mixed  radix  FFT  and  was  chosen 
as  the  best  conventional  mixed  radix  FFT. 

Singleton's  mixed  radix  FFT  (labeled  MFFT)  and  the  fixed 
radix  FFTs  were  compared  to  the  WFTA  and  PFA.  The  real 
operations  and  memory  required  was  tabulated  and  plotted  for 
all  of  the  N  length  sequences  permitted  by  WFTA  and  PFA. 
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This  comparison  showed  that  the  WFTA  and  PFA  required  less 
real  operations  but  that  the  FFTs  requires  loss  memory.  The 
MFFT  was  much  more  flexible  than  WFTA  or  PFA  since  N  can  be 
any  length  sequence. 

The  WFTA  and  PFA  were  then  more  closely  studied  and 
the  tradeoffs  between  the  two  were  discussed.  The  PFA  uses 
less  additions  but  more  multiplications  for  most  N  length 
sequences  which  means  VJFTA  is  more  efficient  when  multipli¬ 
cations  are  "costly"  relative  to  additions.  The  PFA  uses 
less  memory  than  the  WFTA  which  makes  PFA  preferable  when 
the  machine  is  memory  limited.  Further  criteria  considered 
in  selecting  between  these  two  algorithms  are  the  (1) 
machine  language  and  (2)  the  particular  application  of  the 
algorithms.  If  the  machine  language  permits  "shifts"  to  be 
used  for  multiplication  by  1/2  the  PFA  performance  can  be 
impioved.  (The  percentage  improvements  have  been  tabulated 
for  all  permissible  PFA  sequence  lengths) .  The  second  con¬ 
sideration  affects  the  WFTA  since  any  repeated  use  of  WFTA 
for  the  same  length  N  sequence  does  not  require  the  algorithm 
to  re-initialize  the  multiplier  coefficients.  Improvements 
in  operating  speeds  of  40%  over  the  initial  WFTA  were  realized 
on  the  Cyber  74  for  various  sequence  lengths. 

An  algorithm  to  select  the  most  efficient  DFT  method 
from  WFTA  (Wi no grad),  MFFT  (Singleton),  fixed  radix  FFTs, 
and  PFA  (Kolba  and  Parks)  was  presented.  This  selection  is 
based  on:  minimizing  real  operations  and  minimizing  memory 
size  for  the  machine  used.  Minimizing  real  operations  is 
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the  best  "first  order"  criteria  (Singleton,  1969)  and  was 
verified  by  timing  the  transforms  on  the  CDC  Cyber  74.  A 
summary  of  the  above  conclusions  is  presented  in  Table  5.1. 

The  PFA  was  chosen  as  the  best  DFT  technique  because 
it  minimizes  real  operations  well  below  the  FFT  levels, 
requires  substantially  less  memory  than  WFTA,  and  is  more 
flexible  than  the  fixed  radix  FFTs.  Of  course,  the  "optimum" 
algorithm  depends  on  the  specific  application  and  computer, 
but  for  general  applications  the  PFA  provides  the  best  mix 
of  minimizing  real  operations  and  memory. 

5.2  Recommendations 

The  above  conclusions  related  to  an  algorithm's 
efficiency  were  based  on  real  operations  and  then  verified 
by  timing  tests  on  the  CDC  Cyber  74.  The  Cyber  74  is  a 
representative  large  main  frame  computer  with  very  high 
speed  additions  and  multiplications. 

To  further  substantiate  the  conclusions  of  this  paper 
it  is  recommended  that  similar  timing  tests  be  made  on  other 
computers  (large  and  small)  available  at  AFIT  and  the  results 
compared  to  the  predicted  efficiencies  based  on  real  additions 
and  multiplications.  All  of  the  data  necessary  to  perform 
these  tests  is  available  in  this  paper. 
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Appendix  A.  Radix-2  FFT  Algorithm 

This  appendix  presents  an  algorithm  for  computing  the 
complex  fast  Fourier  transform  (FFT)  defined  by; 

N-1 

X(k)  =  Y.  x(n)  exp(- j2  iTnk/N) 
n=0 

where  k  =  0,1,  ...,  N-1 
and  n=2^,  M  integer. 

A  FORTRAN  subroutine  is  listed  for  computing  the 
radix-2  FFT  of  a  single-variate  forward  complex  Fourier 
transform  or  calculates  one  variate  of  a  multi-variate 
transform. 

Arguments . 

A  =  The  complex  array  to  be  transformed  which  is 
dimensioned  to  length  N. 

N  =  The  integer  sequence  length  to  be  transformed 

.  M 

which  must  have  length  equal  2  . 

M  =  The  integer  power  of  2. 

Usage .  For  a  single  variate  forward  transform: 

(1)  .  Specify  the  input  complex  sequence  A  along  with 

parameters  M  and  N. 

(2)  Dimension  complex  array  A  to  length  N. 

(3)  Call  FFT2C  (A,M,N) . 

(4)  A  contains  the  complex  output  vector  X(k). 
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Appendix  B.  Radix- 3  FFT  Algorithm 

This  section  presents  an  algorithm  for  computing  the 
fast  Fourier  transform  {FFT)  based  on  a  method  called 
decimation-in-time  described  in  Chapter  III.  This  algo¬ 
rithm  is  an  efficient  method  for  computing  the 
transformation : 

N-1 

X(k)  =  1  x(n)  exp (- j2'rrnk/N)  ,  k  =  0,1,2,,.,,  (N-1) 

n=0 

where  X(k)  and  x(n)  are  complex  valued.  This  algorithm 
requires  that  the  sequence  length  be  N=3"',  m=p,l,2,...  . 

This  appendix  lists  a  FORTRAN  subroutine  for  computing 
the  radix-3  FFT.  This  subroutine  computes  the  single¬ 
variate  complex  Fourier  transform  or  calculates  one 
variate  of  a  multivariate  transform. 


Arguments. 

A  «  The  real  part  of  the  array  to  be  transformed 
which  is  dimensioned  to  length  N. 

B  =  The  imaginary  part  of  the  array  to  be  transformed 
which  is  dimensioned  to  length  N. 


M  =  The  exponent  of  3. 

M 

N  =  The  length  of  the  data  sequence  (N=3  ) . 

IW  =  A  work  vector  of  length  M. 

WKS  and  WKC  =  Storage  arrays  of  length  N  used  for 
sine  and  cosine  lookup  tables. 
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Usage .  For  a  single  variate  forward  transform: 


(1)  Specify  the  input  scquencos  A  and  B  along  with  the 
parameters  M  and  N. 

(2)  Dimension  A,B,IW,WKS  and  WKC  to  the  correct  lengths 

(3)  Call  FFT3TM  ( A, B , M,N , IW,WKC , WKS) . 

(4)  A  and  B  are  the  output  real  and  imaginary  portion  o 
the  complex  vector  X(k) . 
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J=J-I 
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PPIHT^'  'TEST  1  ij  ■ 

X I H 1  =iEi- DUD  ( CP..' 
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f 

930= 
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( 
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i  ir  4  1 1  =  :  .iD'J  ’  t  -  r:-_  [jl  ^  ||  ‘  •  r  .'  '  i  i  i  ! 

!.  u  '?  "  =  Pr  1 1  *  T*  ^  _  PL'r  r  ^  E  T  T  J’*E^  '  »  ’  'G'J  T  I 


I 


1  c'T  0=0 

♦♦♦  RP'-'RY  1  RPE 

HDM  :h 

*FFLED. 

lc'P0=0 

1  cR  1  •'='. 

1 5  0 '.•=.: 

♦  u'l  iP'J  TE  THE 

r  r  ril  1  , 

rur-  i 

1310=0 

1320=0 

1 33  0=0 

OOl'IPUTE  BF  ODH 

STRUTS 

1340= 

0 OUST  = .  c'6o  0 

254037 

544 

1350=0 

1360=0 

THIS  LOOP  OOUilT 

S  THE 

STRGE  HO.  THE  i^O.  OF  STR<?ES  = 

1370=0 

1330= 

XIHH=SEOOHD 

f  0P> 

1330= 

jjO  3  0  L=  1  ?  i‘1 

1400=0 

1410— C  THE  liHTEGEP  D  IS  THE  DISTRilCE  BETMFEl^  BUTTEPFL lES  ^BF!' 
14S0-=C  m.ihICH  HriVE  THE  SRHE  CDHPLEX  TWIDDLE  FflCTQPS  s  (TF^ 

1430=C 

1440=  D=3^^L 

1450=C 

1460— C  TYPES  DF  BF  IH  STBGE  WHICH  USE  DIFFEREHT  TFs 
1470=C 

1430-=  LHl^L-l 

1490=  TYPE^S^^ill 

150 0-C 

1510=0  IHITIRLIZE  THE  TWIDDLE  FRCTOPS: 

1520=C 

1530=  TFai=l. 

1540=  TFB1=0. 

1550=  TFa3=l. 

1560=  TFBS—O, 

1570=  TFfl3=l. 

1530=  TFB3=0. 

15?0=C 

1600=C  CD11PUTE  DISTRHCE  BETWEEH  BF  EHDPdl'*TS  FDP  THI:  sTRbEJ 
1  c>  1  0=L 

loi:iO=  P=TYPE 

1630=  RX£=P^2 

1640=  PX4=P^4 

1650=C 
1660=C 

1670=C  CDHPUTE  IHDEX  COHSTRilTS  FDP  CQS  &  SIH  LaUKUP  TRBLES 
1 63  0=C 

16?0=  t<l=i^^D 

1700=  i<:3=2^i<'l 

171 0=C  THIS  LOOP  liiDEXES  THE  BF  WITH  SRHE  TFS»  IHDEXES  THE  TF 
1720=0 

1730=  DO  40  J=1.TYPE 

1740=0  FIRST  STRGE  HRS  HD  TF!S  SO  SKIP  TF  CDHPUTRTIDH 

1750=  IF  <.L.EQ.  1,.'  i’O  TO  60 

1760=  IF<J.EQ.1>  GO  TO  60 
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Appendix  C.  Radix- 3  FE'T  in  R  ( u) 

This  appendix  presents  an  algorithm  for  computing 
the  radix-3  FFT  based  on  a  method  which  transforms  the 
array  from  the  complex  domain  (l,i)  to  the  R(u)  domain 
{l,u)  . 


Arguments 

A  *  Real  portion  of  the  complex  data  sequence  to  be 
transformed.  It  is  dimensioned  to  length  N. 

B  =  Imaginary  portion  of  the  complex  data  sequence 
to  be  transformed.  It  is  dimensioned  to  length  N. 

M  =  The  exponent  of  3. 

N  =  The  length  of  the  data  sequence  (N=3^) . 

IW  =  Work  vector  dimensioned  to  length  M. 

WKC  and  WKS  =  Storate  array  dimensioned  to  length  N 
and  used  for  sine  and  cosine  look  up  tables. 

RTEST  =  Set  equal  to  zero  or  one.  If  the  data  sequence 
is  real,  RTEST=1;  if  the  data  sequence  is  complex,  RTEST=0. 

Usage.  This  algorithm  is  an  efficient  method  for 
computing  the  transformation: 

N-1 

X(k)  =  I  x(n)  exp(-j2TTnk/N)  k  =  0,1,  ... 
n=0 

where  X(k)  and  x(n)  are  complex  valued.  This  algorithm 

M 

restricts  N  to  equal  3  where  M=  0,1,2,  ...  . 
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For  a  single  variate  forvard  transform: 

(1)  Specify  the  input  sequences  A  and  B  along  with 
parameters  M,  N,  and  RTEST. 

(2)  Dimension  A,B,WKC,WKS,  and  IW. 

(3)  Call  FFT3RU  ( A, B,M, N, IW, WKC, WKS , RTEST) . 

(4)  A  and  B  are  the  output  real  and  imaginary  portion  of 
the  complex  vector  X(k) . 


i'  ' 

[ 
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Append  Lx  D.  Rad  FFT  Algorithm 

This  section  presents  an  aljorithm  for  computing  the 
FFT  based  on  decimation-in-time  of  the  discrete  Fourier 
transform  defined  by: 

N-1 

X(k)  =  Z  x(n)  exp(-j2nnk/N)  k  =  0,1,2,  ...  N-1 
n=0 

where  X(k)  and  x(n)  are  complex  valued.  This  algorithm 
restricts  the  length  of  the  sequence  to  be  N=5"'  where  m 
is  an  integer. 

In  this  appendix  a  FORTRAN  subroutine  FFT5TF  is  listed 
for  computing  the  radix-5  FFT.  This  subroutine  computes 
the  single-variate  complex  Fourier  transform  or  performs 
the  calculation  for  one  variate  of  a  multivariate  transform. 

Arguments 

A  =  Real  portion  of  the  complex  data  sequence  to  be 
transformed.  It  is  dimensioned  to  length  N. 

B  =  Imaginary  portion  of  the  complex  data  sequence. 

It  is  dimensioned  to  length.  N. 

M 

M  =  Exponent  of  5,  where  N=5‘  . 

M 

N  =  Length  of  the  data  sequence  (N=5  ) . 

IW  =  Work  vector  of  length  M. 

WKC  and  WKS  =  Storage  arrays  dimensioned  to  length  N 


and  used  for  sine  and  cosine  look  up  tables. 


Usage .  For  a  single  variato  forward  transform: 

(1)  Specify  the  input  sequences  A  and  B  along  with  the 


parameters  M  and  N. 

(2)  Dimension  A,B,IW,WKS  and  WKC  to  correct  lengths. 

(3)  A  and  B  are  the  output  real  and  imaginary  portion  of 
the  complex  vector  X(k). 
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*  ^  -  — 

'  ~ 

1 


1  C.4u=C 

1  f ,  t'l  =  c 

COMPiJTF  THP  rr;  ;im 

T.AFLE 
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THE  TFS: 
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Ea£:-=CE£'?^-:E35 
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Rac3  FP^  Thoor-/ 

This  section  presents  the  theory  of  the  radix-5  FFT 
starting  with  the  DPT  definition  and  then  decomposinu  tht 
DFT  equation  using  the  decimation-in-time  algorithm 
(Cooley  and  Tukey,  1965) .  This  development  closely 
parallels  the  radix-3  development  presented  earlier  and 
consequently  the  radix-5  theory  will  be  brief. 

The  DFT  X(k)  is  computed  by  separating  the  discrete 
time  sequence  X(n)  into  five  N/5  point  sequences  (n  must 
be  of  length  s"',  m  =  0,1,2,  ...).  X(k)  is  given  by  the 
DFT  expression: 

N-1  nk  where  k  =  0,1,  ...,  N-1 

X(k)  =  Z  x(n)W  (D.l) 

n=0  and  =  exp(-j2Tr/N) 

Breaking  X(n)  into  five  N/5  point  sequences  yields  X(5r), 
X(5r+i),  X(5r+2),  X(5r+3),  and  X(5r+4).  Using  these 
sequences  and  Eq  (D.l)  gives: 

N/5-1  5rk  N/5-1  (5r+l)k  N/5-1  (3r-: 

X(k)  =  x(5r)W  +  ■'  x(5r  +  l)W  3  x(5r  +  2)W\. 

r=0  ■  r-0  ‘  r=0 

N/5-1  (5r+3)k  N/5-1  {5r+4)k 

+  E  x(5r+3)W  +  I  x(5r+4)w  (D.2) 

r=0  r=0 


By  regrouping  exponents  and  making  the  substitution  of: 


W 


5r 

t 

N 


=  W 


N/5 


(D.3) 


then  Eq  (D.2)  can  be  written  in  final  form  as: 
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N/"-  1 


h  :;/^-L 


xl^.DW  .  W 

r=0  r-{) 


2k  N/5-1 


+  k  x(5r+2)W 

r=0  r=0 


3k  N/5-1  rk 

x(5r+3)lV,/t 


4k  N/5-1 


X  (  5r+4  ) 


(D.4) 


Each  of  the  N/5  point  DFTs  in  Eq  (D.4)  represents  an  N/5 
length  sequence  and  the  terms  in  front  of  the  summations 
are  the  butterfly  multipliers. 

Eq  (D.4)  can  be  rewritten  to  reflect  the  N/5  point 
DFTs  as: 

k  2k  3k  4k 

X(k)  =  A(in)  +  WjjB(m)  +  C(m)  +  D(m)  +  E{in)  (D.5) 

2 

For  N=5  =25  the  Eq  (D.5)  representation  is  shown  in  Figure 
D.l  and  uses  a  less  cumbersome  FFT, notation  (Rabiner  and 
Gold,  1975).  X(k)  is  obtained  by  evaluating  Eq  (D.5)  as: 

X(0)  =  A(0)  +  B(0)  +  C(0)  +  D(0)  +  E(0) 


X(l) 

=  A(l) 

1 

"  ^^^25 

B(l) 

2 

+  w 

2  5 

C(l) 

3 

+  W25 

D(l) 

4 

+ 

2  D 

E(l) 

X(2) 

=  AC) 

2 

"  ‘'^2  5 

B(2) 

4 

^  '-25 

C(2) 

6 

"  ^2  5 

D(2) 

8 

E(2) 

6  12  18  24 

X(6)  =  A(0)  +  W25  B(0)  +  W25  C(0)  +  W25  D(0)  +  E(0) 
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Ed 
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Figure  D.l 


;  \  'll' 


2/ 

■tr, 

,,  n  9  2 

t  2  3!  -  Ml)  ^  1:  '  n  * 

:>  w  ,,- 

r.(  31 

24 

48 

72  96 

X(24)  =  A(4)  +  W.,^  B{4)  + 

W.jf-  C(4)  +  ' 

W-,_  D(4)  +  W25 

E(4) 

The  above  expressions  explicitly  describe  the  first 
stage  decimation  for  N=25.  The  next  step  is  to  evaluate 

A(m)  -  E(mj  which  are  also  5-point  DFTs.  The  5-point  OFT 
for  A(in)  can  be  evaluated  as: 


N/5-1  rm 
A(m)  =  E  x(r)W  /c 
r=0 

which  results  in  five  N/25  length  sequences: 
N/25-1  Sim  m  N/25-1 


(D.6) 


A{m)  =  I  a(5i)W,,,,;-  +  W, 


£  a(5i+l)W 


~  "  ”N/5  "N/25 

2m  N/25-1  Sim  3m  N/25-1  Sim 

^  '^N/S  ^“q^^^^'^^^'^N/25  ^N/5  ^^q^^^^'^^^'^N/25 


4m  N/25-1 
W  ■  a(^i+4)W 

i^O 

m  ^  ()  ,  1  ,  .  ,  .  ,  4 


N/25 


(2.7' 


It  can  be  seen  from  Figure  D.l  that  a(5i)  =  x(0), 
a(5i+l)  =  x(5),  a(5i+2)  =  x(lO),  a(5i+3)  =  x(15),  and 
a(5i+4)  =  x(20)  for  the  5-point  DFT  of  A(m).  The  final 
expression  for  the  A(m)  5-point  DFT  is  given  from  Eq  (D.7) 
where  N=25: 
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Figure  D. 


Basic  Radix-5  Butterfly  Using 
Twiddle  Factors. 
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0 

0 

0 

0 

A  (0) 

a(0) 

!'•' 

^  (  L) 

1 

.) 

a  (  ;- ) 

‘■V, 

a(;)  <  !■;.  af-J) 

j 

{D.8) 

1 

0 

■> 

4 

A(l) 

=  a(0) 

+ 

W5 

a(l) 

+ 

w; 

a(2) 

+ 

W3 

a ( 3 )  +  W3  a ( 4 ) 

(D.9) 

2 

4 

6 

8 

A(2) 

=  a(0) 

+ 

W. 

3 

a(l) 

+ 

a(2) 

+ 

a  {  3 )  +  W  3  a  ( 4 ) 

(D.IO) 

0 

6 

9 

12 

A{3) 

=  a(0) 

+ 

'^5 

a(l) 

+ 

W3 

a(2) 

+ 

^5 

al3)  +  W3  a(4) 

(D.ll) 

4 

8 

12 

16 

A(4) 

=  a(0) 

+ 

W5 

a(l) 

+ 

W3 

a(2) 

+ 

W5 

a(3)  +  W3  af4) 

(D.12) 

From  Eqs  (D.8)  -  (D.12)  the  basic  butterfly  multipliers  are 
derived  to  be: 

k  2k  3k  4k 

X(k)  =  A(k)  +  W^(k)  +  Wjj  C(k)  +  Wjj  D(k)  +  Wjj  E(k)  (D.13) 

k+r  2k+2r  3k+3r 

X{k+r)  =  A(k)  +  B(k)  +  C(k)  +  D(k) 

4k+4r 

+  Wjj  E(k)  (D.14) 

k+2r  2k+4r  3k+6r 

X(k+2r)  =  A(k)  +  W„  B(k)  +  C(k)  +  W„  D(k) 

N  N  N 

4k+8r 

+  Wj^  E(k)  (D.15) 

k+3r  2k+6r  3k+9r 

X(k  +  3r)  =  A{k)  +  B{k)  +  C(k)  +  W,,  D(k) 

N  N  N 

4k+12r 

+  E(k)  (D.16) 

k+4r  2k+8r  3k+12r 

X(k+4r)  =  A{k)  +  W.,  B(k)  +  W„  C(k)  +  W„  D(k) 

N  N  N 

4k+16r 

+  E(k)  (D.17) 
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Tho  Ecjs  (0.13)  -  (0.17)  arc  ::howii  Ln  t!.',-  twiddle’  factor 
butterfly  of  Figure  D.2  where  "r”  is  the  distance  between 
the  butterfly  and  points.  Since  N=5r  the  butterfly  multi¬ 
pliers  reduce  to  constant  complex  multipliers  of: 
r  6r  16r 

^  ^  cos(2tt/5)  -j  sin(2TT/5) 

2r  12r 

=  Wj^  =  cos(4Tr/5)  -j  sin(4tT/5) 

3r  2r  ^  8r 

=  (W.,  )  =  W„  =  cos(47t/5)  -i-j  sin(47T/5) 

N  N  N 

4r  r  9r 

Wjj  =  (Wj^)  =  =  cos(2ti/5)  +j  sin(2TT/5) 

These  constant  butterfly  multipliers  are  computed  once 
during  the  FFT  computation  and  used  in  every  radix-5 
butterfly. 
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Appendix  E . 


Mixed  Rnj;li_x  fFT  Algorithm 

This  section  presents  an  aiv^-orithm  Cor  computing  the 
FFT  based  on  the  discrete  Fourier  transform: 

N-1 

X(k)  =  L  x(n)  exp  (- j2r.nk/M) 
n=0 

The  algorithm  described  here  can  accept  an  N  length  sequence 
which  is  factorable  by  2,  3,  4,  or  5.  To  aid  in  selecting 
an  appropriate  length  sequence  for  this  algorithm  a  list  of 
numbers  less  than  50,000  containing  no  prime  factors  larger 
than  five  is  listed  in  Table  E. 

Arguments 

A  =  The  real  portion  of  the  complex  data  sequence  to 
be  transformed.  It  is  dimensioned  to  length  N.' 

B  =  Imaginary  portion  of  the  complex  data  sequence  to 
be  transformed.  It  is  dimensioned  to  length  N. 

M  =  Number  of  factors  of  N. 

WKC  and  WKS  =  Storage  arrays  dimensioned  to  length  N 
and  used  for  sine  and  cosine  look  up  tables. 

N  =  Length  of  the  sequence  to  be  transformed.  M 
must  be  an  integer  power  of  2,  3,  4,  5,  or  a  combination 
thereof . 

AT  and  BT  =  Arrays  used  in  the  subroutine  for  tem¬ 
porary  storage  of  A  and  B  during  the  data  reordering  (digit 

reversal) . 

NFAC  =  Contains  all  the  factors  of  N.  NFAC  is  computed 
by  the  user  and  passed  to  the  subroutine  in  the  argument  list . 
Dimensioned  to  length  M. 
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I\vi'  1  :  i  n:'  I  ill-  povi-r  .  ij'’  2,  j,  -5,  .nil!  'j  ni-i  is 

d imons  i  oni'd  to  Ii-n'itii  4. 

IWK(l)  =  powers  of  5 

IWK(2)  =  powers  of  4 

[WK(3)  =  no'wcrs  of  3 

IWK(4)  -  r.'ov.'ors  of  2  (must  I'';-  i"'  or  1) 

Usage .  The  subroutine  listed  permits  a  maximum  of  11 
factors  which  is  adequate  for  any  N  less  than  2^^  with  the 
factoring  used  by  this  subroutine. 

(1)  Dimension  arrays  A,B,AT,BT,WKC,  and  WKS  to  length 
N  and  array  NFAC  to  length  M. 

(2)  Factor  N  and  store  them  in  array  NFAC.  Array  NFAC 
must  contain  the  factors  of  N  starting  with  the  high¬ 
est  prime  factor,  5,  and  continuing  to  the  lowest,  2. 

E.G.  N=480 

NFAC(l)  =  5,  NFAC(2)  =  4,  NFAC (3)  =  4 

NFAC(4)  =  3,  NFAC{5)  =  2. 

(3)  Specify  the  integer  powers  of  2,  3,  4,  and  5  in  the 
array  IWK. 

E.G.  N=480 

IWK(l)  =  1,  rWK(2)  =  2,  IWK (3)  =  1,  IWK (4)  =  1 
In  general , 

N  =  2  3  4‘  5  '  and 

IWK(l)  =  q,  IWK (2)  =  p,  IWK (3)  =  n,  IWK (4)  =  m. 

(4)  Specify  values  for  A  the  real  part  of  data  sequence 
and  B  the  imaginary  part  of  the  data  sequence. 

(5)  Call  FFTMR(A,B,M,N,WKC,WKS,AT,BE,NFAC,IWK) . 

(6)  A  and  B  contain  the  real  and  imaginary  part  of  the 
transform  X(b) . 


223 


) 


TAI’.Li; 
,.,,1  • :: 


i 


i 

I 

% 

k 

0 


'  .  ^ 

r  H  - 

.3  '' 

w  ■ 

^2. 

">  2** 

Hi-. 

-  .  - 

6u  J 

62i. 

72u 

▼  29 

81b 

SOh 

ICiilO 

10  24 

12i)U 

1215 

1353 

1L40 

160*. 

162(. 

1921: 

1944 

216- 

2187 

243- 

2*-  3- 

2  08- 

2916 

32ji 

3  2i*t 

364i 

37j». 

>L  5«. 

H  J  9ft 

'♦  ft  j  J 

>  •:  j  L 

5  18h 

5“  3  C 

6  J  J  ^ 

o  .  ’5 

b  ft  1 

’29  ' 

7  >  )  ■ 

? 

7  ^ 

. 

1'  - 

'  i 

<  1  ■ 

-- 

t 

^  7  • 

~ 

2  »  - 

'  - 

•  '  i 

■  J 

*  ^ 

■  > 

:  '■!  - 

.  7  i 

6  h3 

..7  7- 

79- 

768 

p  n 

9&  . 

s6J 

=  72 

103- 

1125 

1152 

125  3 

1233 

1295 

1450 

iro3 

If  36 

1728 

13Db 

1875 

2».-. 

i:  25 

2t  ‘*b 

225  , 

23  34 

2>.  Jf. 

256. 

2352 

2"'  u  w 

3Lj. 

3:  ’2 

3:25 

337^ 

3  ♦ '  5 

3‘  j  L 

33  4  - 

3t  it 

H  .. 

422. 

♦  3  ’  ♦ 

t,  •  * 

f  .  ■> 

ft!  0 . 

5  j 

5' 2  . 

9  7  ft  - 

32 

bl4H 

j'  '  ‘ 

6' 

ftT; 

J5l2 

7c;:. 

7t5 

Tj 

t:  7  ’ 

■x  ■ 

• 

y 

i  1  _  . 

1  '2 

13ti24 

144.- 

i4-  3. 

If  .  .■ 

i  ft  J  0  . 

13  "  2 

Iftt2  • 

16'’  J  • 

1  H  .  « 

1  ft  3  9  ♦ 

1  3  0  7  5 

17284 

17195 

15'  3 : 

13  22 '  ■ 

li  ‘  32 

id75 

191.. 

1  U  ^.1 

X  9..83 

2  j  .  j  b 

2025 

2  Jt  i. 

£  1- 1  T  E‘ 

2  1  .  -  - 

ft  1  -  ’ 

4.  1  ■  - 

225. 

2  3 »  »  ' 

t  i  •  2  : 

'*  1  •'  1 

i 

1.  H  -  ■  - 

Znr;  . 

.2  5.  'i 

C  *  . 

2  592- 

262ft‘. 

27Cu- 

27!:-4-) 

26  1  2 1 

28331 

29160 

30?3>. 

31775 

3C 2C 

3  11J4 

3125b 

32«J.1 

32430 

32765 

328  35 

3375C 

3456  J 

349  92 

362  0  b 

3  6451 

36  664 

375jJ 

334  31 

38886 

3936b 

4iC  JU 

40533 

43963 

4lt  72 

4  12'il. 

4374w 

4b0wj 

45033 

46656 

♦  6875 

48  3  00 

48600 

49152 

501  0  6 

225 


•a  t 


n;*  r.:>rT.'  T  ^  r 


il'  Ij--. 

230=r 
S30-C 
300=C 
3 1 0-C 
320-C 
330=C 
FFTilP, 
340=C 
E  FFT . 
330=.: 

UP  I 

•-•O 

3?0-=C 
UP  I  i'll? 


TliiE 

4  I'l  I j  —  i' 


THE  I i'lH'? I HHr'Y  VhL'JE  TO  BE  TFr'ii .  rrPi’EI'. 

HH  DiJTF'iJT  B  13  F’EPLriCEI'  t'Y  TMc  FHi'FIEP  TF‘fliSS-FOFl.1. 

B  13  D 1  i'lEH:  I  D1■^EJ.|  TO  LEHoTH  H 
Hi  I'f'JHBEF  OF  FHCTOFS 

Hi  THE  LEHGTH  OF  THE  SE.J'.JEHCE  TO  BE  TRHHSFORHED.  H  HOST  BE  Hii 
IHTEGEP  POMEP  OF  £»3»4*  0P5. 

I.ilKC !  DIHEliSIO  D  TO  LEi^GTH  i'<  Ri'^D  COUTRIi^S'  THE  COEIiHE  TcPH’i  FOP 
'.‘IKSi  D I HEl'i: I Oi ■(ED  TO  LEi'(GTH  .(  R!'(D  rOi(TRIi’(  THE  :ii(E  TEPHi  FC?  Th 
RT i  D I i'1Ei( 3 1 Oi(EBi  TO  LEi('3TH  ii  RHD  I’'  UiED  TO  iTOPE  THE  R  R-'r«  '  ji 
■'  Hijr  F  LE 

BT •  D I i'lEi'l i  1 01(EI'  TO  LEidBTH  H  RiiD  1*  ij .  ED  TO  -TOPE  THE  B  ;.i 

SH.jFFLE 

HrRC«  C 01  iTR I i •i'r  THE  FRCTOP.  OF  H.  i'’'.'-T  BE  rr’.  -ED  i"0  TH'_ 


III  THE  Rr  I’iJi'icl * T  i_  I  .  T  .  THE  HRi  TE 


I  .  i  Cl  i  I  ‘ 


I'.ir  E I  ^  '  O' 

I'ul'r  I  I  —  H'PmI-'  Or- 

Ililr  '4  '  —  HE'i'cr  OF  c  .  I'liJ  -  T  t'E  ■'  Or' 
I-  RL  L  -  •  :  101  (E 

pilTHO^'i  'OlHi  i  1',  Hi,  Hi  I’r  r!■^•  ■  RF 


'■'I  R'HV  F:ri 


DlilElC  I  OH  R  '■  H  '  » tf  .'H,.  » I'l}'-.:  ■  !(.•  » Iilj.  3  .'T( '  »  IIilK  '  4.'  »  RT  ■  i  (.'  »  BT  .'H!' 
DI1TEH3  lOH  HCOUHT  <1 1,>  »i(B!R3E  <  i  i.-*  •HDIblT  '  1 1.-'  nHFRC  <H-* 
IHTEGEP  P » TYPE' 3RHE« D 


c  IHiJrrLE  THE  iHPiJT  rtPrHY  TiT  r'c-'rrSE  1.1 1 I  UPi'Er 

SH I H— S  ECDHii '  •-  P  • 

Da  135  1^1. H 
rlT<I.i=ria> 

135  BTa>=B'i:' 

C  CDi'IP'JTE  THE  liRSE  HUilBEPS  Dr  THE  i.  DUHTcP 

I'lFPC— 1'1 
HBriSE  1  !•'  ■=  1 
DO  100  J-E'i'l 

HBrl  S'E  '  '  —Hr  PC  •  i'lr  PC  '  ♦i'^BP  E  '  J~  1  ' 

100  I'lr  PC —i'lr  PC  ~  1 

■T  Cl]I''PHTE  The  COHI^TEP  LI'''TT.  "r  Fr*.—  .  !  - 

DO  I  1 0  J-  1 .  H 

110  I  ^D I  o  I  T  '  J  '  — I  tF PC  •  J  ■  —  1 

Tijfr  ii,'"’ •:  i-r  .  ,,  ---o- 


cl  T'" 

j  ^  It  ^ 

ICOHHT^l 

[  1  j  M  n  ^ 

1  0  1 

i  O't'lr'i.lTF 

1  Oc"-'' 

10  115 

;  ipE'"'^  1 

1  ri4"-- 

r  -=  I'l 

1  "  - '  - 

I'l]  I  ’  1 

I  'J  O  •-*  “ 

i  Ir  E  'v'  =  i  Ir  E 

1 0  r  0  ■=  1  ? 

r  ■=(“  -  1 

103  0=.: 

1 o?o=c 

.CHECK  IF 

1  1 

.  Oi'^F'i.lTF  '^”HF  D I  I T  re'"'  '"'PL'. IE  OF  H 


'  ♦!  ^r'r*  c 


CHECK  IF  CHUFFLE  13  i'lECESSRPY  PH  THIS  Pfll? 


1 


1  '  .c  .  . 

I  :  I  ,  -  '  -  ■  _  , 

[  ■.1^  T-r  I  *■  * 

l'v4'J—  TFH  ^  —  1.  '  Jc'l  1  ♦r  1^+  1 

Tr  B  :••=— Iili^  _  '  c  t  ’ 

1  U—  Trrf4  — m'K  <-  “  J)'1 1  *f-  1  ■ 

1 57  0-  Tr  B4=-!iiF  S  <  Ji"!  1  ♦(*  1  • 

I9>3m—  TFf45— I'llf'  C  Ji‘1 1  4-*‘ i  • 

IS"?'!'—  TFB5— S  '  Ji’1 1  ♦F  4+"  1  • 

1500-  50  CDHTIHiJE 

1510-.: 

1550-;:  THIS  LC3QP  PEFFOPHS  THE  T<~PT  DFT.  THE  LaD?  IS  IHCPEH 

15  30-.:  r'r:THH>:£  o  ;.;hI':h  :£i_E.:T:  the  hfmt  bf  ..iith  the  :(5I'1E 

1  54  0-  DO  I  5  '!<  1 1  -  J «  H <  D 

15S0-=  15-11  •*•5 

15o0-=  i:.-I5+P 

1570-  14-13-t-P 

15.30-  IS-I4+F' 

I  550-.: 

c'Ouu-.:  T;.iIDDi_E  THE  EF  II^POIT:  HHD  TOPE  r'F:.'JuT:  Ii<  TEHF'  LOCi^''’' 

5  0 1 0-.: 

c:'  0.r’  M  -  H  I  T  — rt .  I  I  . 

;  I  T  -  ;  ,  T  I  , 


‘Jo  ‘J  — 

E.5T^  .P  •  15  ■ 

•  T  ? :  ^ 

I  ^  '  ♦  TF  ric  ' 

c’  0  ^  ‘  j  - 

t5  T  =  .  5  .  I  :  ‘ 

♦  Tr  H  '  .  — 

I  ■  ♦TFp-  . 

5100- 

B  3T  -  .  (5  •  I  3 

;  •  ♦  T  r  r  3  '  ■  t 

.  I  3  .  ♦TFh*. 

51  1  0  = 

r54 T—  *  5  *  14* 

♦  T  r  fH  4  •  —  ■  f . 

I  4  .  ♦  T  F  p  4  > 

5150  = 

t;4T=  .5.14. 

*  T  F  r  4  .  ■  f '  ' 

14. ♦ TF  H4 . 

5 1 

h'=:T^.H.  7=:. 

♦  T  ^  ,4  —  1  . 

I=,  .♦tff-=.  . 

5  14.'i- 

.  ♦  TF  p"^.  >  +  .  B 

'IT' ♦TF  HT 

5  1  = 

T[:  r-5 

51c.0=  ol 

He;  T  =H  '  I  e!  ' 

5170= 

B5T=S • 15 • 

5150= 

H3T=H. 13' 

5150= 

B3T=B. 15. 

5500= 

H4T=H<  14.- 

229 


iij  h- 


2  ri- 

43c‘65  =  *.  G  _  4 ♦r' 

2400- 

i.iz'H  ::fH4'=>_  G  i  2*6  .-:Ph4 

24  1  0^ 

-4t  iz't'5=:  I  r'(4^t c’i‘^B5 

2420^ 

?i;'B2B4=5  I i 4iz^B -3‘^E:4 

243  0= 

C  4  B  3  B  4 = C  G  '■  4  ♦  B  3  P  B  4 

244  0= 

C2B2B5=7G' 2^B2PB5 

2450  = 

243463=5  I  H4^h4;'1h 3 

2460= 

523532=5 132^35i''32 

2470= 

C4B2B5=CG5  4^B2PB5 

2430= 

Cc  E!\:'B4=<.'G'.  2^B  :'PB4 

24  30  = 

.4353i:’=  I34*35'l'’3i 

2  5  0  0  = 

.2343;’'=  I  He  ♦34i’i3  • 

251  0  = 

t_3.e.5  =  31  T+i_4r-*  :*r-4'^' 

252  0= 

34-4  =3 1  T  +■.  43c35  +  '. 

25?  0  = 

3e'5=  .  4  r  ■'  r  4  ♦  .  e  r  j  r 

254  0  = 

.  3  44-=  .  4Bc'B-5—  .  c  B  et' 

255  0  = 

■_  r  2  5  =  F'  1  T  + 1.  4  7  r  4  * . 

25c'0  = 

B  P4  =  B  1  T +1. 4Bc  t  5+'. 

■  ?  e  '■  =  ■  4 ,4  i  4  - 

^  r  1  >< 

■2730 

2741-1 

2750 

2760 


M  •  -  r  I  T  +  p 

Ir  ■  r,--=* 


r  4  *  '  r  '4 
p  ■  4  -  r  ■  4 
r  r  ,  - 


EHD  r2DIX  5 
r'rllilX  4  ici-TinU 


—  2  0  0  r'P  I  •  'I  till''  *  2  *  —  '  •  T  **IK  '  2  * 

— 1_  6PE  THErE  6HV  rni'lEr'-  Gr  4' 

—  Ir  '  1  iilr*  ' 2  '  •  LE •  O  '  '^D  TG  3 0 0 


c'  I'l-'- 

□i'lPUTE 

THc  lili'E!'  Gi*  Ih.^F 

£!?7  0- 

F  1  =':■  Hi'lE 

c'Fao- 

c.'  =  c'  ♦  1 

K3=3^f''  1 

30  0  0= 

!!□  33  0  J 

=1 .TYPE 

3010= 

I E ' I MK  <  1 

•  .  HE.  'J  *  oG  TG  3  1  0 

3  03  0  = 

I F  '  L4 .  EFi 

.!  •  .5G  TG  311 

3030= 

3 1  0 

IF  ' :  J.  EF'. 

1..'  3G  TG  311 

5  04  0= 

5  0  3  i]i  — 

TF33 = "'F  •- 

'■  JH 1  ♦K  1  +  1  1 

ji  1*1  o  0  — 

TFt'3  =  -"'f 

.  '  Ji'’  1  ♦>'  1  +  1  ' 

T FH  •'  —  i«ir  t. 

'  1  3+  1  ■ 

2;  M  — 

TF r  ;  =  ” ''F 

'  .FI  1  ♦r  c  +  1  ’ 

}:  M  j  ij  = 

TF  34  — iiV  1. 

■  Jl'il+r  3+1  • 

1  Ij  M  i 

T-f  4^-'.:r 

F'M+r  :  +  l' 

31  1  0  = 

31  1 

ui  1 T  I  i <ijE 

3  13i'i  =  3 

31 30=3 

T H  i  .  l-GE 

F  r'ErFGF'i'11  THE  4  FT 

3 1 4  •:>= 

r."?  '  j ’ 

1  -  •  H  •  ; ' 

“•  j  •r’  ,  _ 

T  -  -  I  1  i 

I'll  •  . 

33  1  II - 

I F  1  1  Mr  '  1  '  .  ■  'F  .  ' '  '  '  ’G 

TG  3  1  ■' 

333 1  '  = 

Ir"  'L4.r''.  1  '  •■■n  l^G  C 

1'-' 

■:3  3'‘i-  31  ■- 

IF'  l.c''.l'  '-•G  TG  t  l 

334 1'l- 

Hr'  T  -H  '  I  c  '  ♦TH H, —  r  ■  1  r' 

■  •TFr3 

33'^’  0  = 

f  ■  r  T  -  r.  '  I  r  '  ♦  T  F  F  r  +  t  '  I  r 

'  ♦  T  r  r4 

-■  C  •  I  = 

r<  1  -1 1  ■  I  .  •  ♦  I  r  r-l  -  t  '  I  : 

■  ♦  T  r  r  ■ 

3 ' 1  - 

I  I  -  •  '  i  '•IF-  ‘  •  1 

•  ♦  [  -  r+ 

3  3'  1 1  = 

ri.jr~H'  I4'*TEH4-r  '  14 

'  ♦  r  P 

33  30  = 

P4T— P^I4'^TFti4  +  tif  T4 

•  ♦TFP4 

3  300  = 

.-FG  TG  317 

3310=  315 

P3T=PM3- 

333  ri- 

B3T  =  B  < I3v 

231 


DFT 


a  ••  ■■  -- 

4  tj4  0  - 

r  T  —  f-  ’  •  .  r  r  '  ' 

a  ri~  ,-i^ 

—  r“'  **  "“8  — 

4  0  >2.  ij  - 

riZ  1  LJ 

4  ^ 

. r  o  ■ 

4 1  j  •?  1  j  - 

B8T=F • 18 ■ 

4  0  ?  U- 

rt3T=Ra3:' 

41  i;n:i= 

t:3T  =  £:  •'  1 3  • 

411.>  317 

Hcr(3  =  Hc'T+H  -^T 

4120^ 

B8B3=B8T+B3T 

4130= 

R  ■'  1 1 =R  1 T +R8R:3 

4 140-= 

B '1 1>  =B1T+B8B3 

4150= 

r'Riil2=CnH  j  T ♦  '  B  3T— B8T 

4160= 

P  t'''lc'=i.  DR ..  T ♦  <  R2T~R  3T  ' 

4170  = 

PRlcI  1  — R 1 T—  0 . 5^R8R3’ 

4180= 

PBi'.l  1  =B  1 T-0.  5'*B8B3 

4190= 

R  '■  1 2  '  =PR''i  1  — PR!ii8 

42  0<.'i= 

B  '■  1 8  '  -PB"!  1  ~PB'.‘>2 

481  0  = 

R  '■  I  3.'  ■=PRi*i  1 +PRI1I8 

4880  = 

t!  1 3  '  =PB'ii  1  +PBl'l8 

4230=  390 

.:D1■^TI^^:JE 

4840=  380 

CDl^TIRiJE 

4850=  370 

i.  GUT  I  RiJE 

486  0= 

PPIHT*»  'PRi'Iv  3  oDi'tE 

£  7*  1  j — Z 

4._-9  1-1=  ' 

43  3 IJ-." 

4  3  I'l  fi  -  4  I'l  M 

4  ]  I'l  ^ 

r-,  ji,.  ,  T_  •  —  -  ■- 

4  38  0  = 

D-H 

4  330= 

-  Ri'1E=  1 

4  ■  4  ij  — 

r  —  ■  c 

4  350= 

T  Y  P  E = P 

4  I'ri'  0  — 

TFR 1  —  1 . 

4  370  = 

TFB  1  =  1-'. 

433  0  = 

T  F  R  3  =  1  . 

43 

T  F  r’  c'  —  * '  - 

44  0  ij  — 

r  1  =  _  Ri''E 

44 1  0= 

DD  43m  j— 1 ' Type 

448  0  = 

IF'J.E.?.!  '  .30  TO  411 

44:3  U  = 

JIT1=J-1 

444  0  = 

TFR8=''IF '  JiT  1  ♦F  1  +  1  .• 

233 


44"  '  i- 
440  iJ—  4  i  1 

447 

44d0- 

44  4  0- 
4T'  0  0- 
451  •> 
4520- 

45  ?0- 
4540=  415 
4550= 
4560=  417 
4570= 
4530= 
4530= 
4600=  430 
4610= 

462 0=C 

4630= 

4640= 

465  0=C 

466  0=C 
4670=  500 
4o3  0= 

4o3  0=^EnP 
47  0  U=^E0r 


Tr  —  .  ■  Ji'1 1  ♦i^  1  +  1  ' 

I  1  =  J 
I  c‘=  I  1  +P 
6 1 T-6 ‘  I  1 • 

B 1 T=B • I  1 ■ 

I E  ■  J .  E'.' .  1  ■  oD  TD  415 
6ii:T=H  I  iii.'  ♦TFRiz!— E:  •  I  c!..'  ♦TF B2 
B2T=(3  ( 1 2.-*  ♦TFB2+B  <  12^  *TFR2 
TO  417 
HiiT =H  (  1 2.’ 

B2T=Ba2> 

3  ( 1 1  =3 1 T '♦■32T 
B  n 1>  =B1T+B2T 
3  <  I2'''  =31T— 32T 
Ba2>=BlT-B2T 
COJITIHUE 

PRIHT^.  ••P3DIX  2  DDHE” 

EHD  PRD IX 

FFTD'JT  =SECOHD  <  CP>  — FF T 1 3 

PPIHT^»  •TIHE  TO  PEPFOPII  FFT=  ’ » FFTOUT 


EHD  OF 
PETUPH 
EIW 


■TUP  SUBPOLITIHE 


i- .  ■  1 1  ' -  r  1  L  1  vjli.;  Count  t'or  FF'^i'MR 

The  operati  jns  count  for  the  factorization  used  in 
!h:s  aignr:tl'*ni  is  i  function  of  (1)  the  number  of  butter¬ 
flies,  (2)  thi'  numb'T  c  complex  twiddle  factors,  and 
(3)  the  nuiTatr  of  •  .les  the  cosine  and  sine  difference 
equations  must  be  computed.  The  number  of  butterflies  in 
a  mixed  radix  algorithm  has  been  shown  to  be  (Singleton,  1969) 

m 

1  (N/p.)  (E.l) 

i=l  ^ 

and  the  number  of  complex  twiddle  factors  is: 
m 

Z  (N(p.-l)/p.)  -  (N-1)  (E.2) 

i=l  ^  ^ 

where  N=p^P2  ...  Pj^.  The  radices  in  this  algorithm  are 
restricted  to: 

N  =  2^  3®  4*^  5“  (E.3) 

Given  the  factorization  in  Eq  (E.3)  the  radix-2  section 

(where  p=2)  has 

r  r 

Z  (N/p.)  =  Z  (N/2)  =  rN/2  (E.4) 

i=l  ^  i=l 

butterflies  which  require  four  real  additions  each.  The 
number  of  complex  twiddle  factors  for  the  radix-2  is 
given  as : 

r  r 

Z  {N(p.-l)/p  )  =  Z  (N/2)  -  rN/2  (E.5) 

i=l  ^  ^  i=l 

which  requires  four  real  multiplications  and  two  real 
additions  each.  Notice  that  the  N-1  term  has  not  been 


subtracted  as  in  Eq  (E.2),  The  N-1  term  will  be  subtracted 
after  the  total  operations  count  has  been  derived  for  3,  4, 
and  5  factors  and  combined  with  factors  of  2.  Using  Eqs 
(E.4)  -  (E.5)  and  the  number  of  additions  and  multiplications 
required  for  «  ich  provides  the  operations  count  for  the 
radix-2  section  as: 

real  mult  =  4(rN/2)  =  2rN  (E.6) 

real  adds  =  4(rM/2)  +  2(rN/2)  =  3rN  (E.7) 

The  radix-3  section  requires  4  real  multiplications 
and  12  real  additions  per  butterfly  and  4  real  multi¬ 
plications  and  two  additions  per  complex  twiddle  factor. 

Using  Eqs  (E.l)  and  (E.2)  the  number  of  butterflies  for 
p=3  is: 

s  s 

E  (N/p.)  =  E  (N/3)  =  sN/3  (E.8) 

i=l  ^  i=l 

and  the  number  of  twiddle  factor  (neglecting  the  N-1  term) 
is : 

s 

E  (N(p.-l)/p.)  =  2sN/3  (E.9) 

i=l 

Combining  the  additions  and  multiplication  ,  required  for 
each  butterfly  and  twiddle  fac  or  wi  th  Eqs  (E.8)  -  (E.9) 
gives  the  operations  count  for  the  radix-3  section  as: 

real  mult  =  4(sN/3)  +  4(2sN/3)  =  4sN  (E.IO) 

real  adds  =  12(sN/3)  +  2(23N/3)  =  16sN/3  (E.ll) 

The  radix-4  sectic  .i  has  zero  real  multiplications 
and  16  real  addition  per  butterfly  with  4  real 
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multiplications  and  2  real  additions  per  twiddle  factor. 
The  number  of  butterflies,  where  p=4,  is  given  by: 


t  t 

E  (N/p.)  =  Z  (N/4)  =  tN/4  (E.12) 

i=l  ^  i=l 

the  number  of  twiddle  factors  is: 
t  t 

Z  (N(p.-l)/p. )  =  Z  (3N/4)  =  3tN/4  (E.13) 

i=l  ^  ^  i=l 

Using  the  number  of  multiplications  and  additions  per 
butterfly  and  twiddle  factor  in  Eqs  (E.12)  -  (E.13)  gives 
the  total  operations  for  factors  of  4  as: 

real  mult  =  4(3tN/4)  =  3tN  (E.14) 

real  adds  =  16 (tN/4)  +  2(3tN/4)  =  lltN/2  (E.15) 

The  radix-5  section  requires  16  real  multiplications 
and  32  additions  per  butterfly  with  4  real  multiplications 
and  2  additions  per  twiddle  factor.  Using  Eqs  (E.l)  and 
(E.2)  where  p=5  gives  the  total  butterflies  as: 
u  u 

^  (N/p.)  =  Z  (N/5)  =  uN/5  (E.16) 

i=l  ^  i=l 

and  the  number  of  twiddle  factors  as: 

u  u 

Z  (N(p.-l)/p.)  =  Z  (4N/5)  =  4uN/5  (E.17) 

i=l  ^  ^  i=l 

Combining  Eqs  (E.16)  -  (E.17)  with  the  operations  required 
for  butterflies  and  twiddle  factor  in  the  radix-5  section 
gives  the  total  as: 

( 
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real  mult  =  16{uN/5)  +  'l(4uN/5)  -  32uN/5 
real  adds  =  32(uN/5)  +  2(4uN/5)  =  8uN 


(E.18) 


Using  the  results  of  Eqs  (E.4)  -  (E.18)  and  subtracting 
the  N-1  complex  twiddles  provides  the  number  of  real  oper¬ 
ations  used  for  butterflies  and  twiddle  factors  for  the  mixed 
radix  algorithm.  The  expressions  are: 


real  mult  =  2rN  +  4sN  +  3tN 
+  32uN/5  -  4 (N-1) 


(E.19) 


real  adds  =  3rN  +  16sN/3  +  lltN/2 
-*■  8uN  -  2  (N-1) 


(E.20) 


Recall  that  Eqs  (E.19)  -  (E.20)  account  for  only  two 
of  the  three  sources  of  real  operations  in  this  algorithm. 
The  third  source  is  computing  the  sine  and  cosine  look  up 
table.  From  the  FORTRAN  program  in  this  appendix  the 
expressions  computing  the  look  up  table  are: 

WKC(I)  =  C*WKC(I-1)  -  S*WKS(I-1)  +  WKC(I-l)  (E.21) 
WKS(I)  =  (*WKS(I-1)  +  S*WKC(I-1)  +  WKS(I-l)  (E.22) 


Each  equation  requires  5  real  addition^  and  2  real 
multiplications  and  they  are  computed  N-1  times  for  the 
mixed  radix  FFT.  The  real  operations  required  to  compute 
the  look  up  table  are: 

real  mult  =  4 (N-1)  (E.23) 

real  adds  =  10 (N-1)  (E-24) 
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Conibininq  lJc[s  (C.23)  -  {L;.24)  with  the  real  operations 
for  butterflies  and  twiddle  factors  provides  the  total 
real  operations  for  the  mixed  radix  FFT: 

real  mult  =  2rM  +  4sN  +  3tN 

+  32UN/5  -  4(N-1)  +  4(N-1) 

=  2rN  +  4sN  +  3tN  +  32uM/5  (E.25) 

real  adds  =  3rN  +  16sN/3  +  lltN/2 


+  8uN  -  2(N-1)  +  10{N-1) 
=  3rN  +  16sN/3  +  lltN/2 
+  8uN  +  8(N-1) 


(E.26) 
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Development  of  the  Mixed 
Radix  Digit-Reversed  Algorithm 

Assuming  that  the  number  of  points  to  be  transformed 
satisfies  N=rj^,  r^,  where  r^^,  V2>  •••/ 

integer  values,  the  indices  of  x(n)  and  X(k)  can  be 
expressed  as  (Brigham,  1974) : 

"  =  %_1  ^^2  ^3  •••  ^m^  %-2  ^^3  ^^4  **• 

+  n,r  +  n->  (E.27) 

1  m  u 

^  ""2  •••  ^m-2  ""2  * '  *  ''m-2^ 

+  +  kjj  (E.28) 

where 

^i-1  “  *"  ^1^1"' 

n.  =  0,  1,  2,  ...  0  <  i  <  m-i 

For  N=30  =  2x3x5  =  input  sequence 

x(n)  counter  is: 

n  =  n2  (15)  +  n^^,  (5)  +  n^  (E.29) 

where 

Uq  =  0 ,  1 ,  2 ,  3 ,  4 
=  0,  1,  2 
n2  =  0,  1 

The  output  counter  k  for  X(k)  is: 

Jc  *  k2  (6)  +  (2)  +  kjj 
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where 


-0,1,2 

=  0,  1,  2,  3,  4 

To  implement  the  general  digit  reversed  counter  let  the 
input  counter  n  use  the  digit  reversed  multipliers  of  the 
output  counter  k: 

^  "  %-l  %-2  ^’^l^  • 

For  the  example  r2  r^  =  2x3x5 
counter  becomes; 

n  *  n2  + 

where,  as  before: 

1,  2,  3,  4 
1,  2 
1 


(E.30) 

+  nu  (r^ 

=  30  the  digit  reversed 

(E.31) 
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Append  Lx  F.  S  i  nc;  Lcton  '  s  Mixoci  R^ix  FPT 

This  program  was  written  by  R.C.  Singleton  and  pub¬ 
lished  by  the  IEEE  press  in  "Programs  for  Digital  Signal 
Processing".  It  computes  the  DPT  defined  by: 

N-1 

X(k)  =  I  x(n)  exp(- j2TTnk/M) 
n=0 

It  also  computes  the  1/N  scaled  inverse  Fourier  transform. 

The  subroutine  listed  in  this  appendix  factors  N  into 
"square"  and  "square-free"  factors  and  stores  the  results 
in  an  array  NFAC.  It  then  calls  subroutine  FFTMX  to  com¬ 
pute  the  complex  Fourier  transform,  twiddle  the  data,  and 
reorder  the  complex  array  to  final  order. 

Use  of  this  subroutine  for  multi-variate  transforms  is 
described  in  the  comments  section  at  the  beginning  of  the 
program.  A  multi-variate  transform  is  basically  a  single¬ 
variate  transform  with  modified  indexing  (Singleton,  1977) . 

The  subroutine  listed  permits  the  sequence  length  that 
has  15  or  fewer  factors. 

The  smallest  number  that  has  more  than  15  factors  is 
12,754,584  and  if  this  condition  is  encountered  an  error 
message  is  printed. 

The  transform  portion  of  the  subroutine  includes 
sections  for  factors  of  2,  3,  4,  or  5  as  well  as  a  general 
section  for  odd  prime  factors.  The  special  sections  for 
2  and  4  include  the  twiddle  factor  multiplication  in  these 
special  sections  instead  of  using  the  general  twiddle  factor 
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section.  "Perforninq  tli'-  tran.s?prm  in  this  manner  pro¬ 
duces  a  10  percent  speed  improvement  over  the  general 
twiddle  section"  (Singleton,  1969).  The  special  sections 
for  3  and  5  are  similar  to  the  general  odd  factor  section 
but  reduce  the  indexing  required  and  thus  improve  the 
speed  (Singleton,  1969) . 

Arguments .  The  Singleton  FFT  for  computing  a  complex 
single-variate  transform  is  called  using  the  following 

arguments : 

A  =  The  real  part  of  the  array  to  be  transformed  and 
is  dimensioned  to  length  N. 

B  =  The  imaginary  part  of  the  array  to  be  transformed 
and  is  dimensioned  to  length  N. 

N  =  Length  of  the  input  sequence  N  which  must  be  a 
positive  integer  with  no  more  than  15  factors. 

NSPN  =  The  spacing  of  consecutive  data  values  while 
indexing  the  current  variable  (in  units  determined  by  the 
magnitude  of  ISN) . 

ISN  =  The  sign  of  ISN  determines  the  transform  direc¬ 
tion  (negative  for  forward  and  positive  for  inverse) .  The 
magnitude  of  ISN  determines  the  indexing  increment  for 
arrays  A  and  B.  Normally  the  magnitude  of  ISN  is  unity. 

NSEG  =  An  integer  value  such  that  NSEG  x  N  x  NSPN 
equals  the  total  number  of  complex  data  values. 
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Usage .  I'or  a  s  uuiie-var  Late  forward  transform: 

(1)  Cpcci  f'/  the  input  lioguences  A  and  B  and  parameters 
NSEG=1,  N=transform  length,  NSPN-1,  and  ISN=  -1. 

(2)  Dimension  A  and  B  to  length  N. 

(3)  Call  FFTSNG  ( A , B , NSEG , N , NSPN, ISN) . 

(4)  A  and  B  are  the  output  real  and  imaginary  portion 
of  the  complex  vector  X(b). 

To  perform  a  real  valued,  inverse,  or  multi-variate 
transform  refer  to  the  comments  portion  of  FFTSNG. 
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Appuiu.ii.-:  .  IM:iL  Mixed  iXidix  I'r’T 

The  International  Mathematical  Subroutine  Library 
contains  a  mixed  radix  subroutine  which  can  perform  the 
FFT  of  any  positive  inteqer  length  sequence.  This  sub¬ 
routine  was  based  on  Singleton's  article  "On  Computing  the 
Fast  Fourier  Transform",  Comm.  ACM  10(10)  1967  in  which  he 
proposed  several  ideas  used  in  the  IMSL  subroutine.  As 
stated  in  Chapter  III  the  program  closely  resembles 
Singleton's  algorithm  published  in  the  open  literature 
but  the  IMSL  version  has  been  copyrighted  and  the  FORTRAN 
code  is  not  listed  in  this  paper.  The  IMSL  description  of 
the  algorithm  and  its  usage  are  included  in  this  appendix 
for  the  convenience  of  the  reader  and  a  detailed  develop¬ 
ment  of  the  real  operations  count  which  was  not  presented 
in  the  main  text  is  also  in  this  appendix. 
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Rcai  OpiTations  CounL  Per 
TMSL  Mivi-d  Radix  Alqorithm 

A  copyr inhtod  mixed  radix  FFT  is  available  through 
the  International  Mathematical  Scientific  Library  (IMSL) 
on  the  CDC  computer  used  at  AFIT-  This  subroutine  (FFTCC) 
can  accept  any  length  sequence  N  including  prime  numbers. 

It  is  based  on  an  article  written  by  Singleton,  "On 
Computing  the  Fast  Fourier  Transform"  published  in  1967. 

Functionally  this  subroutine  has  few  differences  from 
Singleton's  algorithm  described  in  the  preceding  section. 
The  factoring,  twiddle  factors,  and  reordering  of  the  data 
is  the  same,  however,  the  special  sections  for  factors  of 
3  and  4  require  2  and  8  more  additions,  respectively,  than 
Singleton's  subroutine.  Also  this  mixed  radii  algorithm 
uses  the  general  factors  section  for  odd  prime  factors  of 
5  or  greater  which  further  reduces  the  efficiency  compared 
to  Singleton's. 

As  in  the  case  of  Singleton's  FFT  subroutine  the  real 
operations  count  for  the  IMSL  subroutine  is  determined  from 
the  number  of  twiddle  factors: 

m 

Z  (N(p.-l)/p.)  -  (N-1)  (G.l) 

i=l  ^  ^ 

and  the  number  of  butterflies: 

m 

I  N/p.  (G.2) 

i=l  ^ 
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whore  N'=p,  ...  :)  .  In  thin  subron  t:  Lh'.  ■  the  Lict.orinq  is 

12  'in 

performed  such  that  M  -  2^  3'  4*"  p'^'^  .  .  .  with  the 

real  operations  count  being  derived  from  the  FORTRAN 
coded  subroutine  FFTCC  and  the  Eqs  (G.l)  rnd  (G.2) .  The 
radix-2  section  of  FFTCC  includes  the  twiddle  factor  multi¬ 
plications  with  the  butterfly  computation.  In  rhis  case 
there  are  rN/2  butterflies  and  twiddle  factors  to  be  com¬ 
puted  using  4  real  multiplications  and  6  real  additions 
giving : 

#  real  mult  =  4 (rN/2)  =  2rN  (G.3) 

#  real  adds  =  6 (rN/2)  =  3rN  (G.4) 

The  radix-3  section  uses  sN/3  butterflies  and  2sN/3  twiddle 
factors  which  require  4  real  multiplications  and  14  additions 
per  butterfly  and  4  real  multiplications  and  2  real  additions 
per  twiddle  factor.  Combining  the  butterflies  and  twiddle 
factors  the  real  operations  count  for  the  radix-3  section 
is  given  by: 

real  mult  =  4(2sN/3)  +  4(sN/3)  =  4sN  (G.5) 

real  adds  =  14(s.\V'3)  +  2(2sN/3)  -  6sN  (G.6) 

The  radix-4  section  uses  24  real  additions  anu  no  real 
multiplications  for  the  tN/4  butterflies.  The  3tN/4  twiddle 
factors  require  2  real  additions  and  4  real  multiplications. 
Combining  the  results  gives: 

real  mult  =  3tN  (G.7) 

real  adds  =  24tN/4  +  2(3tN/4) 

=  15tN/2  (G.8) 
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All  odd  prime  r.ictors  C'lual  to  or  (ir'.’Otcr  tlian  5  use 
the  general  transform  section.  Based  on  the  L'ORTRAh 
program  written  by  IMGL  there  are  five  sources  of  real 
operations  in  this  general  radix-p^  transform  excluding  the 
array  indexing  additions.  First  ':he  complex  multipliers 
are  computed  for  the  butterfly  transmittance: 

real  mult  =  2(p^-l)  (G.9) 

real  adds  =  (p^-1)  (G.IO) 

for  each  new  factor  p^,  e.g.,  N=7*4=28  and  N=7*7*4=196 

each  require  the  same  (p^-l)=(7-l)  complex  multiplications 

for  the  factor  Pj^=7.  Second  the  complex  twiddle  factor 

multiplications  are  performed  on  the  data  array.  Assuming 

N  can  be  factored  as: 

_  -r  ,s  .t  ml  m2  mk 

N  =  2  3  4  pj_  p2  "  • 

where  p^^  represents  the  i^^  factor  raised  to  some  ;.ositive 
integer  mi,  the  number  of  complex  twiddles  is  (mi)  N(pj^-l) /p^ 
-(N-1).  The  n-1  term  is  subtracted  only  once  for  each  FFT, 
which  means  the  intermediate  result  can  be  written  as: 

real  mult  =  4 (mi ) N ( p^- 1 ) /p^  (G.ll) 

real  adds  =  2 (mi) N (p^-1) /p^  (G.12) 

T.he  invdividual  butterflies  are  computed  next.  The  first 
output  of  each  butterfly  requires  only  3(pj^-l)/2  real 
additions  and  no  multiplications.  For  each  radix-p^^  there 
are  (mi)N/p^  butterflies  in  the  FFT  giving: 
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real  ailds 


-  (8(p.-l  )/2)  {.\:(mi)/p^  ) 

=  4M  (mi)  ( - 1 ) /yj  ^  (G.13) 

Now  the  remaining  portion  of  each  butterfly  is  computed 
2 

using  (p^-1)  real  multiplications  and  additions.  This 
gives  a  total  of: 

2 

real  mult  =  N(p^-l)  (mi)/p^  (G.14) 

real  adds  =  N  (p^-1)  ^  (mi) /p^^  (G.15) 

Finally  the  results  of  the  butterfly  operations  are  stored 
in  the  proper  array  locations  requiring  4  real  additions 
times  (p^-l)/2  times  the  member  of  radix-p^  butterflies. 
This  total  is: 


real 

Combining 
operations  for 

real 


adds  =  (4 (p^-l)/2) (N(mi) /p^) 

=  2  (mi)N(p^-l) /p^ 

Eqs  {G.9)  -  {G.16)  the  number  of  real 

the  p.  factor  becomes; 
k 

mult  =  E  (2(p.-l)  +  4 (mi) N{p . -1) /p . 

i  =  l  1  1  '  1 


(G.16) 


+ 


N(p.-l) 


2 


(mi) /p^ ) 


(G.17) 


k 

real  adds  =  E  ((p.-l)  +  2 (mi) N (p . -1) /p . 

i=l  ^  ^  ^ 

+  4 (mi) N (p^-1) /p^  +  N (p^-1) ^ (mi) /p^ 


+  2 (mi) N (p^-1) /p^) 

k 

=  I  ((p.-l)  +  8 (mi) N(p . -1) /p. 
i=l  ^  ^ 

+  N{p^-1) ^(mi)/p^)  (G.18) 
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Usinq  Eqs  (''i.i7)  ainl  (li.  l-i)  :  ^..-r  Liu  ruid  pr  Liuo  factoi  u  and 
the  roa.l  or)erat!<>ns  count  ton  factors  of  2,  2,  and  4  tho 

total  operations  cound  for  N  =  2^  3^  4^  . . ,  p^^  can 

be  written  as-. 

real  mult  =  2rN  -t  4sN  -t  3tN 
k 

-t  Z  (2(p.-l)  +  4(mi)N(p.-l)/p. 

i=l  1 

+  N(mi)  (p^-l)^/p^)  -  4(N-1)  {G.19) 

real  adds  =  3rN  +  6sn  -t  15tN/2 
k 

+  Z  ((p.-l)  +  8 (mi) N(p. -1) /p. 

i=l  1  11 

-t  N(MI)  (p^-l)^/p^)  -  2(N-1)  (G.20) 

As  in  any  FFT  the  real  operations  associated  with  the 
twiddle  factors  have  been  reduced  by  (N-1)  multiplications 
and  additions  because  the  last  stage  of  decimation-in¬ 
frequency  or  the  first  stage  of  a  decimation-in-time  FFT 
require  no  twiddles. 
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A[>poruTj_>:  H. 


All  Al  >;oi- i  thin  for  C  r:;;i!l:  i  rr'i  the  WFT;'. 


This  program  computes  the  Di'T  defined  by: 

N-1 

X(k;)  =  i  x(n)  exp  ( -  j  Z'mk/N)  ;  k=0,  1,  N-1 

n=0 

where  the  sequence  length  N  is  a  product  of  the  relative 
prime  factors  from  the  set  (2,3,4,5,7,8,9,16). 

Program  Description.  The  WFTA  consists  of  the  six 
subroutines  PERM  1,  PERM  2,  MULT,  WEAVE  1,  WEAVE  2,  and 
INISHL.  Step  One  is  to  map  the  sequence  x(n)  into  a 
u-dimensional  array  s{nj^,  n2,  •  Step  Two  implements 

the  "pre-weave"  modules  in  subroutine  WEAVE  1,  one  for  each 
factor  of  N j .  Each  of  the  pre-weave  modules  contains  only 
additions.  Step  Three  performs  a  point  by  point  multiply 
on  the  data  array  (subroutine  MULT)  of  real  constants 
derived  from  the  small-N  DFT  algorithms.  These  constant 
multipliers  are  a  function  of  the  complex  exponentials  of 
and  are  the  only  complex  multiplications  required  in 
the  algorithm.  Step  Four  implements  the  post-weave 
(WEAVE  2  subroutine)  module  which  contains  additions, 
subtractions,  and  multiplies  by  j.  Step  Five  maps  the 
u-dimensional  array  s(kj^,  k2,  k^)  into  the  correct 

one-dimensional  DFT  x(k)  according  to  the  Chinese  remainder 
theorem  given  in  Eq  (3.144)  (McClellan  and  Nawab,  1979). 

Arguments .  The  WFTA  is  called  using  the  following 
arguments.  More  arguments  exist  in  this  list  than  in  the 
one  given  by  McClellan  and  Nawab  because  array  storage  is 

minimized  in  this  WFTA  version. 
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N  =  Transform  lonqth  whi'-!i  must  be  factorable  into 
mutually  prime  factors  from  the  set  2,3,4,5,7,8,9,16. 

A  list  of  acceptable  sequence  lengths  is  given  in  the  left 
most  column  of  Table  3.9a,b. 

XR  and  XI  =  The  real  and  imaginary  arrays  to  be  trans 
formed  and  are  dimensioned  to  length  N  in  the  calling 
program. 

INIT  =  A  flag  to  specify  whether  the  call  to  FFTWIN 
requires  initialization.  INIT  =  0  means  initialization 
is  required  and  INIT  ^  0  skips  the  phase.  Initialization 
is  needed  when  calling  FFTWIN  for  the  first  time  for  a 
given  sequence  length. 

lERR  =  Contains  an  error  code  upon  return  from  FFTWIN 
If  the  DFT  was  successful  lERR  =0;  if  an  error  occurred 
lERR  =  -1  or  -2.  There  are  two  causes  for  an  error: 

(1)  The  transform  length  is  illegal,  or 

(2)  The  program  has  not  been  initialized  for 
the  correct  length  N  sequence. 

SR  and  SI  =  One  dimensional  working  arrays  of  length 
M  =  X  X  X  which  is  the  product  of  the  multi¬ 
plies  required  by  the  small-N  algorithms.  The  value  of  M 
for  any  permissible  N  is  given  in  Table  H.l  in  the  right¬ 
most  column. 

COEF  =  One-dimensional  array  length  M  used  to  store 
the  constant  coefficients  generated  by  INISHL  for  the 
"weave "  modu 1 e s . 


265 


IN'f)'-’!  iri'i  T'.’r);-:2  ''Jnc-rl  i  moriF;  i  ona  L  IrTiryth  fi  (tiappinq 
vectors  for  pro-  and  post-pormutations  of  tlie  data. 

Usage 

(1)  Specify  the  input  sequences  XR  and  XI  with  parameters 
N,  INIT,  lERR,  SR,  WI,  COEF ,  INDX  1,  INDX  2. 

(2)  Call  WFTA  (XR,  XI,  N,  INIT,  ERR,  SR,  SI,  COEF, 

INDX  1,  INDX  2). 

(3)  XR  and  XI  are  the  output  real  and  imaginary  vectors. 
The  error  code  IERR=0  specifies  successful  completion 
of  the  transform. 

(4)  After  the  initial  call,  use  INIT?^0  as  long  as  N 


remains  constant. 
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Appendix  I.  Computing  the  Prime 
Factor  Algorithm  (PFA) 

This  program  computes  the  DFT  defined  by: 

N-1 

X{k)  =  Z  x(n)  exp(+j2Trnk/N)  ;  k=0,l,  N-1 

n=0 

where  the  sequence  length  N  is  a  product  of  the  relative 
prime  factors  from  the  set  (2, 3, 4, 5, 7, 8, 9  and  16).  This 
algorithm  was  proposed  by  Kolba  and  Parks  in  1977  and  was 
modified  to  the  program  presented  here  in  1980  by  Burrus 
and  Eschenbacher . 

Arguments .  The  PFA  is  called  using  the  following 
arguments . 

N  =  The  transform  length  which  must  be  factored  into 
mutually  prime  factors  from  the  set  2, 3, 4, 5, 7, 8  and  16. 

A  list  of  acceptable  sequence  lengths  is  given  in 
Table  3.11  -  a,b. 

X  and  Y  =  The  real  and  imaginary  data  arrays  containing 
the  sequence  to  be  transformed.  These  arrays  are  dimensioned 
to  length  N. 

NI  =  The  array  containing  the  factors  of  N.  If  all 
four  factors  are  not  used  the  unused  factors  are  set  equal 
to  1.  For  example  with  N=30,  we  have  NI(1)=5,  NI(2)=3, 
NI(3)=2,  and  NI(4)=1.  The  factors  of  one  must  be  the  last 
of  the  M's. 

M  =  The  number  of  nonunity  factors.  For  N=30,  M=3. 
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UNSC  =  An  output  indexing  constant  which  must  be 
precomputed.  UNSC  =  N/{NI(1)  +  ...  +  NI(M)). 

A  and  B  =  Data  arrays  of  length  N  which  contain  the 
results  of  the  DFT.  The  real  part  is  in  A  and  the 
imaginary  part  is  in  B- 

Usage.  To  compute  the  forward  single-variate  DFT: 

(1)  Dimension  X,  Y,  A,  and  B  to  length  N. 

(2)  Define  N,  M,  and  NI(4). 

(3)  Compute  UNSC. 

(4)  Input  the  sequence  to  be  transformed  in  x  and  y. 

(5)  call  PFA  (X,Y,A,B,N,M,NI,UNSC) . 

(6)  The  Fourier  transform  results  are  located  in  A  and  B 
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T ir i np  Tcnhn  on  the  CPC  Cyber  l^ 

The  timing  tents  on  the  CDC  Cyber  74  used  the  FORTRAN 
coiranand  SECOND (CP)  which,  according  to  the  FORTRAN  IV 
reference  manual,  returns  time  accurate  to  "two  decimal 
places",  i.e.,  0.01  seconds.  The  results  of  timing  the 
various  DFT  algorithms  showed  this  clock  was  accurate  to 
three  decimal  places  (0.001  seconds)  giving  a  time  resolu¬ 
tion  of  0.002  seconds.  Using  three  decim.al  places  was 
justified  since  almost  every  standard  deviation  was  less  than 
or  equal  to  0.002  seconds. 

To  verify  the  premise  that  counting  the  real  operations 
performed  in  a  DFT  is  the  primary  factor  determining  execu¬ 
tion  speed  of  the  algorithm  on  a  computer,  the  DFT  execution 
times  were  measured  on  the  CDC  Cyber  74.  The  execution  speeds 
for  the  WFTA,  PFA,  and  the  mixed/fixed  radix  FFTs  were  com¬ 
pared  to  the  "predicted"  execution  speed  of  the  algorithm. 

To  perform  these  comparisons  the  multiply  and  add  speeds 
were  determined  for  the  Cyber  74  computer. 

The  execution  times  of  the  floating  point  multiply  and 
add  instructions  are  given  in  the  CDC  6000  Series  Computer 
Systems  Reference  Manual.  The  execution  times  for  several 
instructions  are  listed  below  and  include  preparing  the  next 
instruction  for  execution: 
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Instruct  irm 

Assembly 

Lanejuag'- 

Minor 

Cvclos 

Floating  sum 

FX . 

1 

4 

Floating  product 

FXi 

10 

Normalize  result 

NX^ 

4 

Fetch/store 

SA, 

3 

where  one  minor  cycle  equals  0.1  microsecond  (ps).  Simply 
using  an  add  time  of  4*0. Ips  and  a  multiply  time  of 
10*0. lus  =  lus  is  not  sufficient  because  the  operands  must 
be  fetched  and  stored  which  adds  more  time.  To  determine 
the  commands  executed  by  the  computer  for  adds  and  multiplies 
the  assembly  (COMPASS)  language  was  studied  and  timed  for 
three  cases.  First,  the  DO  loop  with  no  operations  was 
executed  100,000  times: 

DO  102  J  =  1,N 
102  CONTINUE 

The  associated  COMPASS  language  code  was  listed  as  an 
output  of  the  program: 


BSS 

OB 

SBO 

B2  +  7B 

SA5 

J 

SA4 

N 

SX7 

X5  +  IB 

IXO 

X4  -  X7 

SA7 

A5 

PL 

X5,  (AA 
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This  loor)  rr',;uirod  on  avcracjo  of  2.70:::;  (sLnncJjj'd 
deviation  O.Oins)  to  execute.  Next  the  addition 
instruction  was  executed  100,000  times  using  the  F0RTR/\N 
code : 

DO  102  J  =  1,N 
102  TAD  =  A  +  B 

The  associated  COMPASS  code  for  the  addition  loop  is: 


BSS 

OB 

SBO 

B2  +  7B 

SA5 

A 

SA4 

B 

SA3 

J 

SA2 

N 

PXO 

X4  +  X5 

NX7 

BO,  XO 

SX6 

X3  +  IB 

1X5 

X2  -  X6 

SA6 

A3 

SA7 

TAD 

PL 

X5,  (AA 

This  add  loop  required  an  average  of  3.34us  (standard 
deviation  0.3us)  to  execute.  Notice  the  "extra"  instructions 
of  the  add  loop  versus  the  no  operation  loop: 
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Con’^':n  T'''! 


M i  nor  !  cr- 


SA3 

0 

SA4 

B 

3 

FXO 

X4  +  X5 

3 

NX7 

BO,  XO 

4 

SA7 

TAD 

__3 

17 

Finally  the 

multiply  loop  was 

executed  100 

The  FORTRAN 

code  is: 

DO  102  J  = 

1,  N 

102 

TAD  =  A*B 

and  the  corresponding  COMPASS  code  loop  is: 


BSS 

OB 

SBO 

B2  +  7B 

SA5 

B 

SA4 

A 

SA3 

J 

SA2 

N 

FX7 

X4*X5 

SX6 

X3  +  IB 

IXO 

X2  -  X6 

SA6 

A3 

SA7 

TAD 

PL 

X5,  )AA 

The  multiply  loop  averaged  3.37ys  (standard  deviation  0.03) 
to  execute.  The  extra  instructions  required  for  the  multiply 
loop  relative  to  the  no  operation  loop  are: 
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Conunand 


M inor  Cycl cs 


SA5 

ii 

3 

SA4 

A 

3 

FX7 

X4  *  X5 

10 

SA7 

TAD 

3 

19 

Comparing  the  measured  execution  times  of  the  three 
loops  shows  the  add  loop  is  0.64vjs  longer.  Based  on  the 
minor  cycle  times  for  the  extra  add  and  multiply  commands, 
the  add  loop  should  be  17*0. lys  longer  and  the  multiply 
loop  should  be  19*0. lys  =  1.9ys  longer  than  the  ”no  operation" 
loop.  (Notice  that  every  floating  point  addition  must  be 
"normalized”  by  the  command  NX7  which  requires  4  minor 
cycles.  The  floating  point  sum  does  not  require  normalization). 

The  difference  in  measured  add  and  multiply  speed  (0.64ys 
and  0.67ijs)  versus  the  predicted  add  and  multiply  speed 
(1.7ys  and  1.9ys)  is  a  result  of  the  very  short  loops 
fitting  inside  the  Cyber's  "instruction/execution  stack" 
which  is  a  12  word  stack  with  60  bits  per  word.  Since  the 
entire  loop  could  fit  in  the  stack  the  instructions  were 
fetched  only  once  instead  of  100,000  times,  whereas  "all 
execution  times  (minor  cycles)  listed  include  readying  the 
next  instruction  for  execution".  During  normal  DFT 
algorithm  execution  of  all  of  the  instructions  must  be 
fetched  which  means  the  add  speed  is  1.7ys  and  the  multiply 
speed  is  1.9ys.  These  numbers  were  then  used  to  predict 
execution  speed  of  the  DFT  algorithms. 
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